Bitcoin Forum
March 19, 2024, 02:56:51 AM *
News: Latest Bitcoin Core release: 26.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 [All]
  Print  
Author Topic: Ultimate blockchain compression w/ trust-free lite nodes  (Read 87860 times)
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
June 17, 2012, 06:33:40 PM
Last edit: January 20, 2014, 04:54:23 PM by etotheipi
Merited by ABCbits (49), suchmoon (4), Husna QA (3)
 #1

This idea has been scattered throughout some other threads, but there is no one place that fully explains the idea with pictures.  I believe this relieves two major problems with the network at once -- compression/pruning, and lightweight node security -- and does so in a non-disruptive way.  I am not positive that this is the right way to go, but it definitely warrants discussion.



Summary:  [SEE ILLUSTRATIONS BELOW]

Use a special tree data structure to organize all unspent-TxOuts on the network, and use the root of this tree to communicate its "signature" between nodes.  The leaves of this tree actually correspond to addresses/scripts, and the data at the leaf is actually a root of the unspent-TxOut list for that address/script.  To maintain security of the tree signatures, it will be included in the header of an alternate blockchain, which will be secured by merged mining.  

This provides the same compression as the simpler unspent-TxOut merkle tree, but also gives nodes a way to download just the unspent-TxOut list for each address in their wallet, and verify that list directly against the blockheaders.  Therefore, even lightweight nodes can get full address information, from any untrusted peer, and with only a tiny amount of downloaded data (a few kB).  

(NOTE:  I have illustrated everything as using straight merkle-trees, but as noted in the downsides/uncertainties section: a variant of the merkle-tree will have to be to used that guarantees efficient updating of the tree.)


(1) Major Benefits:
  • (1a) Near-optimal blockchain compression:  theoretically, the size of the pruned blockchain would be proportional to the transaction volume (thus could go up or down), instead of the entire global history which always increases in size.  In practice, it wouldn't be so clean, but you really won't do any better than this.  Whoops! Before this idea was fully developed, I had overlooked the fact that full nodes will still have to maintain the transaction-indexed database.  This address-indexed DB is not a replacement, but would have to be in addition to the that.  Therefore, it necessarily increases the amount of work and data storage of a full node.  But it can simply be an "add-on" to an existing "ultraprune" implementation.  (Either way, this should actually be a downside).
  • (1b) Trustless lightweight-node support:  New nodes entering the network for the first time, will only have to download a tiny amount of data to get full, verifiable knowledge of their balance and how to spend it (much of which can be stored between loads).  A single honest peer out of thousands guarantees you get, and recognize, good data.
  • (1c) Perfectly non-disruptive:  There is no main-network protocol or blockchain changes at all.  All the balance-tree information is maintained and verified in a separate blockchain through merged mining.  In fact, it's so non-disruptive, it could be implemented without any core-dev support at all (though I/we would like their involvement)
  • (1d) Efficient tree querying&updating:  The full-but-pruned nodes of the network will be able to maintain this data structure efficiently.  New blocks simply add or remove unspent coins from the tree, and all operations are "constant time and space" (there is an upper limit on how much time and space is required to prove inclusion of, insert, or delete a piece of data, no matter how big the network is)
  • (1e) No user-setup or options:  Unlike overlay networks, achieving full trust does not require finding a trusted node, or subscribing to a service.  Just like the main blockchain -- you find a bunch of random peers and get the longest chain.  This could be bootstrapped in a similar fashion as the main network.

(2) Downsides and Uncertainties:
  • (1a) See revised (1a) above
  • (2a) Complexity of concept:  This is not simple.  It's a second blockchain, requiring merged mining -- though if it is successful and supported by the community, it could be added to the network by requiring that miners compute and include the root hash of this data structure in the coinbase script (just like with block height).  This is entirely feasible, but it could be a bear to implement it.
  • (2b) Uncertainties about lite-node bootstrap data:  Depending on how the data is structured, there may still be a bit of a data for a lite node to download to get the full security of a full node.  It will, undoubtedly, be much less than downloading the entire chain.  But, there is obviously implications if this security comes at the cost of 1 MB/wallet, or 100 MB/wallet (still better than 4GB, as of this writing).  UPDATE: My initial estimate based on the "Hybrid PATRICIA/Brandais Tree" (aka Reiner-Tree), is that a wallet with 100 addresses could verify its own balance with about 250 kB.
  • (2c) [SEE UPDATE AT BOTTOM] Merkle-tree Alternative Needed: Vanilla merkle-trees will not work, because adding or removing single branches is likely to cause complete recomputation of the tree.  But it should be possible to create an alternative with the following properties:
    • Commutative computation:  a node should be able to get the same answer regardless of whether the tree is computed from scratch, or is based on updating a previous tree.
    • O(log(N)) updating: removing or adding a single leaf node should be able to be done in O(log(N)) time.  With a vanilla merkle tree, this is true only if you remove a node and add a node to the same leaf location.

(3)  Assumptions::
  • (3a) Need verifiable tree roots:  I argue that a regular overlay network won't suffice, solely because it's too easy for malicious nodes to spread incorrect data and muck up the network.  If there's enough malicious nodes in an overlay network, it could make lite nodes that depend on it unusable.  I am assuming it is necessary to have a verifiable source for pruned-headers -- a separate blockchain succeeds because correctness of data is required to be accepted.
  • (3b) Merged mining does what we think it does: It is a secure way to maintain a separate blockchain, leveraging existing mining power.  
  • (3c) Efficient sorting:  Leaf nodes of the main tree will have to be sorted so that all nodes can arrive at the same answer.  However, this can be done using bucket-sort in O(N) time, because the leaf nodes are hashes which should be uniformly distributed.



Alt-Chain Merkle Tree construction:

-- For each address/script, collect all unspent-TxOuts
-- Compute merkle root of each TxOut tree
-- Sort roots, use as leaf nodes for a master-merkle-tree.  
-- Include merkle-root of master tree in alternate chain header.


https://dl.dropboxusercontent.com/u/1139081/BitcoinImg/ReinerAltChain/reinercompression.png



Getting your balance:

-- Download headers of both chains
-- Request unspent-TxOut-hash list.  
-- Compute sub-merkle root for this address
-- Request secondary-branch nodes  (O(log(N))
-- Compute master root; compare to block header
-- Request the full TxOuts for each unspent-TxOut-hash above


https://dl.dropboxusercontent.com/u/1139081/BitcoinImg/ReinerAltChain/reinercompression2.png



Alternate Chain:
All data is included on the alternate blockchain, which is maintained through merged mining on the main chain.  This is only one extra tx per block on the main chain.  That is the full extent of its impact on the main chain, and any nodes that are ignoring/unaware of the alt-chain.


https://dl.dropboxusercontent.com/u/1139081/BitcoinImg/ReinerAltChain/reinerchain.png



Yes, this is a huge undertaking.  Yes, there's a lot of uncertainties. Yes, I need a new merkle tree structure.
But, this idea would kill two massive birds with one stone (kill two albatrosses with one block?)

Alright, tear it apart!




UPDATE:

After lots and lots of discussion and debate, I believe that the address index should be maintained as a trie-like structure.  Other's have expressed interest in a binary-search tree (BST).  Either way, the structure can be adapted to have the same properties we desire of a merkle tree, but with a lot more flexibility, such as very quick insertion, deletion, querying, updating, etc.  My preference is the creme-de-la-creme of tries -- a hybrid PATRICIA tree (level-compressed trie) and de-la-Braindais tree (node-compressed).  It looks something like this:


https://dl.dropboxusercontent.com/u/1139081/BitcoinImg/ReinerAltChain/DataStructures_Values.png

The structure would be indexed by TxOut script ("recipient"), and each node is recursively authenticated by the nodes below it.  The uniqueness of the trie structure guarantees that there is exactly one solution for a given set of TxOuts, which also means that only the existing set of TxOuts need to be obtained in order to create the trie (the BST requires replaying all transactions, in order, to have a well-defined internal structure).  For education on trie structures, see my pretty pictures in this post.

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
1710817011
Hero Member
*
Offline Offline

Posts: 1710817011

View Profile Personal Message (Offline)

Ignore
1710817011
Reply with quote  #2

1710817011
Report to moderator
1710817011
Hero Member
*
Offline Offline

Posts: 1710817011

View Profile Personal Message (Offline)

Ignore
1710817011
Reply with quote  #2

1710817011
Report to moderator
1710817011
Hero Member
*
Offline Offline

Posts: 1710817011

View Profile Personal Message (Offline)

Ignore
1710817011
Reply with quote  #2

1710817011
Report to moderator
"Bitcoin: mining our own business since 2009" -- Pieter Wuille
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1710817011
Hero Member
*
Offline Offline

Posts: 1710817011

View Profile Personal Message (Offline)

Ignore
1710817011
Reply with quote  #2

1710817011
Report to moderator
1710817011
Hero Member
*
Offline Offline

Posts: 1710817011

View Profile Personal Message (Offline)

Ignore
1710817011
Reply with quote  #2

1710817011
Report to moderator
hazek
Legendary
*
Offline Offline

Activity: 1078
Merit: 1002


View Profile
June 17, 2012, 06:43:53 PM
 #2

Before I read this I just want to quickly post that I personally, no matter whether justifiably or unjustifiably, I personally feel like this is the most pressing issue when it comes to Bitcoin's successful future and I really hope the core team has planed an order of priorities accordingly.

My personality type: INTJ - please forgive my weaknesses (Not naturally in tune with others feelings; may be insensitive at times, tend to respond to conflict with logic and reason, tend to believe I'm always right)

If however you enjoyed my post: 15j781DjuJeVsZgYbDVt2NZsGrWKRWFHpp
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
June 17, 2012, 06:47:49 PM
 #3

Before I read this I just want to quickly post that I personally, no matter whether justifiably or unjustifiably, I personally feel like this is the most pressing issue when it comes to Bitcoin's successful future and I really hope the core team has an order of priorities planed accordingly.

I too believe this is a critical issue for Bitcoin, as a whole.  I had floated the idea that handling blockchain size was critical, in the past, but other issues seemed more pressing for the devs at the time -- I didn't have a solid idea to promote, and the blockchain size wasn't so out of hand yet.

One nice benefit of this solution is that because it's an alt-chain, technically no core devs have to be on-board.  It can be done completely indepedently and operate completely non-disruptively, even with only the support of other devs who believe in it.  I'd certainly like to get core devs interested in it, as they are very smart people who probably have a lot of good ideas to add.  But one of the biggest upsides here is that it can be done completely indepdently.


Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
socrates1024
Full Member
***
Offline Offline

Activity: 126
Merit: 108


Andrew Miller


View Profile
June 17, 2012, 07:22:01 PM
Last edit: June 17, 2012, 07:46:30 PM by socrates1024
Merited by ABCbits (2)
 #4

Let me try to explain a solution to the 'alternate Merkle tree' you require.

The basic idea is to use a balanced binary search tree, such as a red-black tree or a 2-3 tree. Updating such a data structure, including rebalancing, only requires accessing O(log N) tree nodes. A lite client would be able to verify such an update by receiving just the relevant nodes. There is never a need to recompute the entire tree from scratch. Balancing is strict, in that the worst-case length from the root to a leaf never exceeds O(log N).

There's been a bunch of academic research on this topic, where it's known as "Authenticated Data Structures" [1,2,3]. Although the idea has been around for almost 15 years, I don't know of a single implementation. So I made a toy implementation available at https://github.com/amiller/redblackmerkle

I'm hoping to spread awareness of this technique, since it's pretty clear that some clever use of Merkle trees is going to be important in Bitcoin's future. Let's discuss this!


P.S. Most of these structures only require collision resistant hash functions. However, if you want to use a fancy hash functions with special (homomorphic) properties, you can make even more efficient structures, e.g. a Merkle 'bloom filter' [4].
 

[1] Certificate Revocation and Certificate Update
     Noar and Nissim, 1998. USENIX
     https://www.usenix.net/publications/library/proceedings/sec98/full_papers/nissim/nissim.pdf

[2] Authenticated Data Structures
     Roberto Tamassia, 2003.
     http://cs.brown.edu/research/pubs/pdfs/2003/Tamassia-2003-ADS.pdf

[3] Persistent Authenticated Data Structures and their applications
     Anagnostopoulos, Goodrich, and Tamassia
     http://cs.brown.edu/people/aris/pubs/pad.pdf

[4] Cryptography for efficiency: Authenticated Data Structures Based on Lattices and Parallel Online Memory Checking
     Papamanthao and Tamassia, 2011
     http://www.cse.msstate.edu/~ramkumar/gw-102.pdf

amiller on freenode / 19G6VFcV1qZJxe3Swn28xz3F8gDKTznwEM
[my twitter] [research@umd]
I study Merkle trees, credit networks, and Byzantine Consensus algorithms.
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
June 17, 2012, 07:33:39 PM
 #5

Let me try to explain a solution to the 'alternate Merkle tree' you require.

The basic idea is to use a balanced binary search tree, such as a red-black tree or a 2-3 tree. Updating such a data structure, including rebalancing, only requires accessing O(log N) tree nodes. A lite client would be able to verify such an update by receiving just the relevant nodes. There is never a need to recompute the entire tree from scratch. Balancing is strict, in that the worst-case length from the root to a leaf never exceeds O(log N).

There's been a bunch of academic research on this topic, where it's known as "Authenticated Data Structures" [1,2,3]. Although the idea has been around for almost 15 years, I don't know of a single implementation. So I made a toy implementation available at https://github.com/amiller/redblackmerkle

I'm hoping spread awareness of this technique, since it's pretty clear that some clever use of Merkle trees is going to be important in Bitcoin's future. Let's discuss this!


P.S. Most of these structures only require collision resistant hash functions. However, if you want to use a fancy hash functions with special (homomorphic) properties, you can make even more efficient structures, e.g. a Merkle 'bloom filter' [4].
 

[1] Certificate Revocation and Certificate Update
     Noar and Nissim, 1998. USENIX
     https://www.usenix.net/publications/library/proceedings/sec98/full_papers/nissim/nissim.pdf

[2] Authenticated Data Structures
     Roberto Tamassia, 2003.
     http://cs.brown.edu/research/pubs/pdfs/2003/Tamassia-2003-ADS.pdf

[3] Persistent Authenticated Data Structures and their applications
     Anagnostopoulos, Goodrich, and Tamassia
     http://cs.brown.edu/people/aris/pubs/pad.pdf

[4] Cryptography for efficiency: Authenticated Data Structures Based on Lattices and Parallel Online Memory Checking
     Papamanthao and Tamassia, 2011
     http://www.cse.msstate.edu/~ramkumar/gw-102.pdf



Wow, thanks socrates!  

That's an excellent place to start.  Of course, I should've thought about that from the start, since I am adept with data structures, and especially trees/tries of sorts.  I have even coded a red-black tree before...  

My brain was stuck on how to modify the base merkle-tree concept into what I wanted, and didn't consider going back to a different (though related) data structure.

There is one problem, though it may be part of the materials you already referenced:  the tree must be constructed identically by all parties, and from any state.  And all binary-tree structures are insert-order dependent, unless you're storing them in some sort of trie.  Specifying an insert order doesn't work, because someone constructing from scratch doesn't know how someone updating from a previous tree will insert them.  But I wouldn't want to go to a basic trie, due to the space inefficiency.  Something like a Patricia tree/trie would probably work, but it's very difficult to implement (correctly) and there's not a lot of existing, trusted implementations out there.

I'll dig into your materials a bit.  Thanks!


Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
2112
Legendary
*
Offline Offline

Activity: 2128
Merit: 1060



View Profile
June 17, 2012, 07:50:25 PM
 #6

(2c) Merkle-tree Alternative Needed
I think this is the crucial observation. Bitcoin doesn't really have a single contiguous bitstream that would need a protection of a hash tree. It has a collection of transactions that are only weakly ordered in a lattice fashion.

The better data structure would be probably some classic database representation like B-tree with cryptographically signed transaction log. To allow integrity verification with a trucated log the log blocks should contain something like a hash of inorder traversal of the database content before the update. This will allow for a quick verification of the new log blocks received from the network.

To allow backward compatibility with forward-delta block-chain the Bitcoin protocol would need additional constraint on the ordering of the transactions in the blocks.

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0
socrates1024
Full Member
***
Offline Offline

Activity: 126
Merit: 108


Andrew Miller


View Profile
June 17, 2012, 08:12:00 PM
Merited by ABCbits (1)
 #7

There are two kinds of orderings here. First is the order in which updates are made to the Merkle trees. Each block is ordered, and within a block transactions are ordered, and within a transaction the txIns and txOuts are ordered. To update your Merkle trees as the result of a bitcoin transaction, you remove all the inputs, and insert all the new outputs. Everyone should be able to do this in the same order, right?


Now, within a Merkle tree, the order is determined by a key. Think of each Merkle tree as a database index. We would like to have at least two kinds of indexes, which means maintaining two instances of the Merkle tree:

1) TxOuts are identified by the transaction hash and an index within that transaction. So we need to search by (txhash,idx) in order to see if an output has been spent. When outputs are inserted into this tree, they're stored in sorted order according to (txhash,idx).

2) It's also desirable to find all the available txouts for a particular address. Let a second Merkle tree contain keys of the form (scriptpubkey). Now, given a root hash, you can ask for a verifiable list of all your spendable coins.


Alternately, instead of thinking of it as a different tree for each index, you can think of it as a composite structure. The general form is a Search DAG (Directed Acyclic Graph), but the idea is exactly the same [5]. (This includes B-Trees).



[5] A General Model for Authenticated Data Structures
     Martel, Nuckolls, Devanbu, Gertz, Kwong, Stubblebine, 2004.
     http://truthsayer.cs.ucdavis.edu/algorithmica.pdf

amiller on freenode / 19G6VFcV1qZJxe3Swn28xz3F8gDKTznwEM
[my twitter] [research@umd]
I study Merkle trees, credit networks, and Byzantine Consensus algorithms.
proudhon
Legendary
*
Offline Offline

Activity: 2198
Merit: 1309



View Profile
June 17, 2012, 08:41:15 PM
 #8

No clue about any of this stuff, but thank you guys for working on this.  I think it's important to have this sorted out before adoption and use gets really heavy.

Bitcoin Fact: the price of bitcoin will not be greater than $70k for more than 25 consecutive days at any point in the rest of recorded human history.
unclescrooge
aka Raphy
Hero Member
*****
Offline Offline

Activity: 868
Merit: 1000


View Profile
June 17, 2012, 09:32:27 PM
 #9

I agree with others, this is a hot issue. And as much as I dislike gambles, I'm thankful Satoshidice put pressure on the blockchain like this.
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
June 17, 2012, 09:35:40 PM
Last edit: June 17, 2012, 09:57:38 PM by etotheipi
Merited by ABCbits (1)
 #10

There are two kinds of orderings here. First is the order in which updates are made to the Merkle trees. Each block is ordered, and within a block transactions are ordered, and within a transaction the txIns and txOuts are ordered. To update your Merkle trees as the result of a bitcoin transaction, you remove all the inputs, and insert all the new outputs. Everyone should be able to do this in the same order, right?


Now, within a Merkle tree, the order is determined by a key. Think of each Merkle tree as a database index. We would like to have at least two kinds of indexes, which means maintaining two instances of the Merkle tree:

Consider 10 unspent-TxOuts, {0,1,2,3,4,5,6,7,8,9}.

You and I are both constructing the same binary/red-back tree, except that you are starting from scratch and I am starting from a tree where elements {0,1,2,5,6,9,13} are part of the tree.  

I have to add {3,4,7,8} and remove {13} from my tree.   You just add all 10 elements in a specified order.  

We're going to get different roots.  I technically know how your tree would've been constructed, but I might have to start over and reinsert everything from scratch if I wanted to make sure my tree structure matches yours.  

That's what I mean about "commutative computation" -- we need to make sure that regardless of whether you're starting from scratch, or updating an existing tree from any point in the past, that you'll get the same answer.  

As I said before, a trie would work, but is generally very space inefficient, and that matters here.

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1135


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
June 17, 2012, 10:04:31 PM
Merited by ABCbits (1)
 #11

Question: as users download this alt-chain, do they always download the alt-chain from its genesis block (imagine namecoin), or do they always only download a certain number of blocks back, after which point, all earlier blocks are discarded?  (imagine p2pool).

I would have to imagine it was the second.

If it was the first, the value of the alt-chain would tend to zero over time: it would only be useful for pruning spent transactions that existed at the time the alt-chain was created, as all the blocks representing diffs to the alt chain would be the same as (in proportion) to the diffs on the main chain.  That is, at least the way I understood it.

If it was the second, I would imagine that it would be more sensible to create a superblock (say, every 1000 blocks) that publishes a brand new tree, and then all non-superblocks that follow (up to the next superblock) would be updates to that tree.  Anyone downloading the alt-chain would never need more than 1000 blocks: one superblock and all the incremental blocks since the superblock was created.  Anything older than the superblock would be simply unnecessary.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1135


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
June 17, 2012, 10:07:29 PM
 #12

I am assuming it is necessary to have a verifiable source for pruned-headers -- a separate blockchain succeeds because correctness of data is required to be accepted.

I believe you could provide this by publishing both your "genesis block" as well as the source code for the one-off utility you make to produce it.  One holding a complete copy of the Bitcoin block chain should be able to run your program and create a bit-for-bit copy of your genesis block.  Get a few people to do this, and to GPG-sign the hash of the resulting output.  Voila.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
DiThi
Full Member
***
Offline Offline

Activity: 156
Merit: 100

Firstbits: 1dithi


View Profile
June 17, 2012, 10:23:04 PM
 #13

This idea has been scattered throughout some other threads, but there is no one place that fully explains the idea with pictures.

I did. Well, not exactly pictures, but ASCII drawings:

https://en.bitcoin.it/wiki/User:DiThi/MTUT

https://bitcointalk.org/index.php?topic=60911.msg709737#msg709737

I wanted to do a prototype but I have been very busy with other projects since then.

1DiThiTXZpNmmoGF2dTfSku3EWGsWHCjwt
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
June 17, 2012, 10:35:28 PM
 #14

Question: as users download this alt-chain, do they always download the alt-chain from its genesis block (imagine namecoin), or do they always only download a certain number of blocks back, after which point, all earlier blocks are discarded?  (imagine p2pool).

I would have to imagine it was the second.

If it was the first, the value of the alt-chain would tend to zero over time: it would only be useful for pruning spent transactions that existed at the time the alt-chain was created, as all the blocks representing diffs to the alt chain would be the same as (in proportion) to the diffs on the main chain.  That is, at least the way I understood it.

If it was the second, I would imagine that it would be more sensible to create a superblock (say, every 1000 blocks) that publishes a brand new tree, and then all non-superblocks that follow (up to the next superblock) would be updates to that tree.  Anyone downloading the alt-chain would never need more than 1000 blocks: one superblock and all the incremental blocks since the superblock was created.  Anything older than the superblock would be simply unnecessary.

The second blockchain actually would have no substance to it.  It would solely consist of headers which would contain the master-merkle roots.  The merkle-roots are created from the main chain data, so you go get the data from that network.   There would be no block reward -- your reward is on the main chain which you mining at the same time.

So yes, everyone downloads the entire alt-chain.  But the entirety of this extra chain is the headers themselves:  so it's only about 4-5 MB per year.

I am assuming it is necessary to have a verifiable source for pruned-headers -- a separate blockchain succeeds because correctness of data is required to be accepted.

I believe you could provide this by publishing both your "genesis block" as well as the source code for the one-off utility you make to produce it.  One holding a complete copy of the Bitcoin block chain should be able to run your program and create a bit-for-bit copy of your genesis block.  Get a few people to do this, and to GPG-sign the hash of the resulting output.  Voila.

See (1e) in the original post:  the point of this exercise is to not have to trust specific nodes, GPG/PGP signatures, add centralization, etc.  You trust the proof-of-work.  The merkle-root with the most work behind it on the alt-chain is the merkle-root that you trust to be correct, just like you do with the main-network headers.

We don't want users to have to setup a list of trusted authorities. Then you have revocation lists.  And keep the list updated.  And maintain GPG keys of such authorities.  And politics about who should be trusted authorities.  And of course, centralization.

Or you setup a alternate blockchain, and trust the data that has the most work behind it.  Voila.

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1011


View Profile
June 17, 2012, 11:32:21 PM
Last edit: June 18, 2012, 12:03:45 AM by maaku
Merited by ABCbits (1)
 #15

As others have mentioned, a self-balancing binary search tree would solve the only real technical issue here. Red-black trees would work fine, or their generalized parent structure the 2-3-4 tree (B+tree of order 4), which would provide a conceptually cleaner implementation and serialization format.

Overall, great work. I can assist you in implementing it.

There is one problem, though it may be part of the materials you already referenced:  the tree must be constructed identically by all parties, and from any state.  And all binary-tree structures are insert-order dependent, unless you're storing them in some sort of trie.  Specifying an insert order doesn't work, because someone constructing from scratch doesn't know how someone updating from a previous tree will insert them.  But I wouldn't want to go to a basic trie, due to the space inefficiency.  Something like a Patricia tree/trie would probably work, but it's very difficult to implement (correctly) and there's not a lot of existing, trusted implementations out there.
(EDIT: Sorry, this is basically what socrates above. I should have read the whole thread first:)

This is a non-issue; simply specify the order of insertion/deletion. For example: “Process blocks in order; for each block process transactions in order; and for each transaction first delete all inputs (in order) from, then insert all outputs (in order) into the alt-chain Merkle tree”. You might have to throw a special case in there for transactions that have as input the output of a transaction that occurs later in the same block (is that even allowed?).

Why (and how) would you be creating a tree from scratch without either access to the blockchain or the tree in the last alt-chain checkpoint?

Quote from: etotheipi
Consider 10 unspent-TxOuts, {0,1,2,3,4,5,6,7,8,9}.

You and I are both constructing the same binary/red-back tree, except that you are starting from scratch and I am starting from a tree where elements {0,1,2,5,6,9,13} are part of the tree.  

I have to add {3,4,7,8} and remove {13} from my tree.   You just add all 10 elements in a specified order.  

We're going to get different roots.  I technically know how your tree would've been constructed, but I might have to start over and reinsert everything from scratch if I wanted to make sure my tree structure matches yours.  

That's what I mean about "commutative computation" -- we need to make sure that regardless of whether you're starting from scratch, or updating an existing tree from any point in the past, that you'll get the same answer.

No, that's going in the wrong direction. Updates to the blockchain occur in atomic operations: blocks. Simply mandate that trees are constructed/updated according to the canonical ordering provided by the blockchain. If you insist on creating the search tree from scratch, simply replay the blockchain inserting and removing in the order specified therein. Or you can start from the tree of the last found alt-chain checkpoint, and replay insertions & deletions from that point forward.

Yes, order of operations matters, so standardize an order of operations.

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
June 18, 2012, 12:03:41 AM
 #16

Two big issues brought up so far, in outside discussion:


(1):  I am currently in respectful disagreement with Socrates about the ability to construct stateless unspent-txout-trees.   Recap:  If one were to use a red-black tree to store this info, then the exact structure of that tree will depend on the entire, exact, incremental history of additions and deletions to the tree starting from the genesis block.  To construct the tree from scratch, I must replay the entire history in the same order every time.

I am of the opinion that you should be able to start with the current list of unspent-TxOuts, however it is that you got them, and should be able to construct the correct tree without knowing anything about the history of how elements were added and deleted.  If using the vanilla red-black trees, if I start with just the current unspent TxOuts list, I have every node in the tree -- but I need the entire history of already-spent-TxOuts just to be able to replay that history to construct the tree correctly.  This seems inelegant and dangerous.  But I can't figure out why, other than gut instinct.

The counter argument is that you will never find yourself in this position:  you are either downloading and processing the whole chain, in which case you have no problem replaying the incremental updates.  Or you download from some checkpoint, in which case you can still replay the insertions and deletions from that checkpoint with minimal effort.


(2):  I misunderstood DiThi's original proposal, as a much simpler tree structure that could not provide the trustless lite-node behavior that my trees do.  However, as I re-read it, I'm realizing that I think he's right -- a simpler tree of unspent TxOuts does actually give you that capability.  Is this right?

A couple months ago when I first theorized this idea, I had a very good good reason for believing that you needed to aggregate the unspent-TxOuts (UTXOs) by address/script.  If you didn't do it, bad things would happen.  Unfortunately, I don't remember what those bad things were!  It's entirely reasonable that I mis-applied some logic in my head and decided I needed an unnecesssarily-complex data structure to achieve security.  

So, am I wrong?  I'm not sure.  What are the weaknesses and advantages of DiThi's tree structure vs. mine (his leaf nodes are UTXOs, mine are roots of UTXO sub trees).  One thing I can think of is:  how do you know that a peer gave you the complete list of UTXOs and is not hiding the rest?  Though, I've already made an assumption that you have at least one honest peer, so that probably not an issue... is it?


Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
BrightAnarchist
Donator
Legendary
*
Offline Offline

Activity: 853
Merit: 1000



View Profile
June 18, 2012, 12:09:42 AM
 #17

listening... very good ideas here, similar to my balance chain concept Smiley but much more fleshed out
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1011


View Profile
June 18, 2012, 12:17:43 AM
 #18

Quote
The counter argument is that you will never find yourself in this position:  you are either downloading and processing the whole chain, in which case you have no problem replaying the incremental updates.  Or you download from some checkpoint, in which case you can still replay the insertions and deletions from that checkpoint with minimal effort.
Correct. So why does it matter? I'm not sure why it's “inelegant” either, since as you point out there is no use case for recreating the tree from only an unsorted list of outputs anyway.

Quote
A couple months ago when I first theorized this idea, I had a very good good reason for believing that you needed to aggregate the unspent-TxOuts (UTXOs) by address/script.  If you didn't do it, bad things would happen.  Unfortunately, I don't remember what those bad things were!  It's entirely reasonable that I mis-applied some logic in my head and decided I needed an unnecesssarily-complex data structure to achieve security.
If nothing else, it is certainly convenient to pull in all the unspent outputs for an address without having to go all over the tree, as you can under your approach. That's a very common use case, so it would make sense to optimize for it.

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
June 18, 2012, 12:27:37 AM
 #19

Quote
The counter argument is that you will never find yourself in this position:  you are either downloading and processing the whole chain, in which case you have no problem replaying the incremental updates.  Or you download from some checkpoint, in which case you can still replay the insertions and deletions from that checkpoint with minimal effort.
Correct. So why does it matter? I'm not sure why it's “inelegant” either, since as you point out there is no use case for recreating the tree from only an unsorted list of outputs anyway.

I didn't point that out, because I don't think I agree with it.  It's a counter-argument that, while I can't dispute it directly at the moment, makes me extremely uncomfortable.  What future use case haven't we considered that would be stunted by this decision?  I'm only agreeing that I don't have a direct counter the argument for it (and thus cannot fully defend my position). 

But I'd hardly believe I'm the only one who would be bothered by it:  I think it's extremely inelegant that if I have every node in the tree, that I still have to download GB more data just to know how to organize it (or rather, I might as well just re-download the tree directly, at that point).

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
socrates1024
Full Member
***
Offline Offline

Activity: 126
Merit: 108


Andrew Miller


View Profile
June 18, 2012, 12:51:42 AM
Merited by ABCbits (1)
 #20

I had a very good good reason for believing that you needed to aggregate the unspent-TxOuts (UTXOs) by address/script.  If you didn't do it, bad things would happen.  Unfortunately, I don't remember what those bad things were!

One of the neat things about a Merkle search structure, rather than just an arbitrarily-ordered Merkle tree, is that you can prove that a key is not in the database. Even with a typical Merkle tree, like the current blockchain, it would require a linear effort to prove that a transaction doesn't exist - assuming all you have is an O(1) root hash, and you don't trust anyone!

Even more generally, you can do a verified 'range query' for only O(M log N) effort (where M is the number of results, N is the size of the tree). If you store each unspent-coin in a binary search tree, ordered by the address, then you can ask an untrusted server to give you a snapshot of all the spendable coins associated with that address. There's no way for them to omit any.

Let me try to describe the scenario how I prefer, since it's hard to keep track of the terms otherwise. There are two parties, the Lite-Client (Client) and the Helper (Server). The goal is for the Client not to have to trust the Server, but for the Server to store all the data. The Client only ever needs to store (like, on disk) a constant O(1) amount of state - the root hash. In order to decide what root hash to use, the Client will have to rely on the proof-of-work to recognize the most recent block.

If the Client asks the Server for the list of unspent-coins matching a target address, he receives two or more O(log N) paths through the Merkle tree. The first one is path is to the element with the largest address (lexical ordering) that is smaller than the target address. The last one is a path to the smallest element with a larger address. If there were no transactions matching the target, then these two elements will be adjacent. In any case, the Client iterates through the paths he receives, at each step checking that the paths are adjacent in the tree (and, of course, that the hashes are consistent and lead to the root).

[edit]I hope this turns into a Merkle trees megathread![/edit]

amiller on freenode / 19G6VFcV1qZJxe3Swn28xz3F8gDKTznwEM
[my twitter] [research@umd]
I study Merkle trees, credit networks, and Byzantine Consensus algorithms.
FreeMoney
Legendary
*
Offline Offline

Activity: 1246
Merit: 1014


Strength in numbers


View Profile WWW
June 18, 2012, 12:55:55 AM
 #21

Why would people mine this other chain?

If it has lower hashing power is everyone trusting it more susceptible to double spends somehow?

Play Bitcoin Poker at sealswithclubs.eu. We're active and open to everyone.
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1011


View Profile
June 18, 2012, 01:07:17 AM
 #22

Conceivably you could run an even lighter client that just implicitly trusts the head of the alt-chain. Such a client would rely upon honest miners doing the work of verifying the alt-chain, and not actually perform such checks itself.

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
Maged
Legendary
*
Offline Offline

Activity: 1204
Merit: 1015


View Profile
June 18, 2012, 04:12:10 AM
 #23

Why would people mine this other chain?
Why do miners currently mine transactions that don't have a fee? The answer to both of these questions are that doing such things aren't terribly costly, but more importantly, it encourages further adoption of Bitcoin which will directly result in increased revenue for miners in the future.

Serith
Sr. Member
****
Offline Offline

Activity: 269
Merit: 250


View Profile
June 18, 2012, 06:52:14 AM
Last edit: June 18, 2012, 09:42:38 AM by Serith
 #24

I think that if you want to have the root value to be the same regardless of order in which a tree or any hierarchical structure was created, then you would need a hash function that has commutative and associative properties.
Technomage
Legendary
*
Offline Offline

Activity: 2184
Merit: 1056


Affordable Physical Bitcoins - Denarium.com


View Profile WWW
June 18, 2012, 09:50:08 AM
 #25

Why do miners currently mine transactions that don't have a fee? The answer to both of these questions are that doing such things aren't terribly costly, but more importantly, it encourages further adoption of Bitcoin which will directly result in increased revenue for miners in the future.
This is changing faster than we've anticipated though. Due to the sheer volume of transactions that SatoshiDice produces, higher fees will start to get priority. The bad thing is that current Bitcoin clients don't have a very sophisticated fee system or fee options.

Denarium closing sale discounts now up to 43%! Check out our products from here!
ffe
Sr. Member
****
Offline Offline

Activity: 308
Merit: 250



View Profile
June 18, 2012, 10:35:38 AM
 #26

Tracking
jl2012
Legendary
*
Offline Offline

Activity: 1792
Merit: 1087


View Profile
June 18, 2012, 10:38:18 AM
 #27

I have a proposal as an "add-on" to this

When a miner sees a tx, he could either 1. grab the tx fee, or 2. "donate" the tx fee to the alt-chain.

Grabbing the tx free is essentially the current practice

If it is "donated", it will go to a jackpot pool for the alt-chain. A miner who find a valid block in the alt-chain will claim the jackpot.

By donating the tx fee, miner will get a "discount" in their mining difficulty. The discount will be calculated in a pay-per-share manner. For example, the current difficulty is 1583 177.847444 with block reward of 50BTC. A fair PPS would be 0.00003158. Therefore, a tx fee of 0.0005 is equivalent to 15.831778 shares. By donating the tx fee, the miner will only need to find a block with difficulty of 1583177.847444-15.831778 = 1583162.015666.

Miner will want to donate tx fee since this helps them to find a block easier and reduce  variation. It also provide a feedback mechanism for mining. In a bad luck turn where many tx accumulate, difficulty is reduced and the bad luck turn could be ended earlier.
The effective difficulty is no longer a constant throughout 2016 blocks. Instead, it will be a zig-zag function: highest at the beginning of a round, keep decreasing as unconfirmed tx accumulate, and jump back to the highest value when a block is found. The average block generation rate is still 10-minutes but the variation is reduced too.

Some miners may abuse this system by creating tx with high tx fee (e.g. 25BTC) and donating it to the alt-chain, so their difficulty is reduced by 50%. To prevent this, we may put a cap on the difficulty discount (e.g. 10% of difficulty) and/or calculate the discount with <100% PPS (So the expected return will be decreased).

Donation address: 374iXxS4BuqFHsEwwxUuH3nvJ69Y7Hqur3 (Bitcoin ONLY)
LRDGENPLYrcTRssGoZrsCT1hngaH3BVkM4 (LTC)
PGP: D3CC 1772 8600 5BB8 FF67 3294 C524 2A1A B393 6517
flower1024
Legendary
*
Offline Offline

Activity: 1428
Merit: 1000


View Profile
June 18, 2012, 10:41:10 AM
 #28

isn't it possible to merge mine the balance chain just like many pools do with namecoin?
that way there is not need to give any reward to miners. just include it in bitcoind as a "protocol requirement" for mining.
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
June 18, 2012, 02:12:07 PM
 #29

isn't it possible to merge mine the balance chain just like many pools do with namecoin?
that way there is not need to give any reward to miners. just include it in bitcoind as a "protocol requirement" for mining.

Yup.  The second chain doesn't need incentive due merged mining, unless it's particularly resource-consuming.  If the software is already written, and integrates into bitcoind or other mining software transparently, and doesn't impact mining performance, then miners would [likely] have no complaints about adopting it given the huge upside it offers.  It's basically free, from their perspective.

Though I don't agree it would be a "protocol requirement."  Miners have various reasons for doing what they do.  And this is being promoted as non-disruptive and optional.  But you don't need more than, probably 20% of mining power to engage this idea for it to be successful.  I'm not even sure what a 51% attack would look like on the alt-chain, but it wouldn't be very exciting -- the worst you could do is prevent some lite-nodes being able to verify their own balance -- but there would be no financial gain for it, and it would cost a fortune.  20% seems like a number high enough that there would always be a "checkpoint" nearby, and high enough that it would be vastly too expensive for a prankster to do anything to that chain.


Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
blueadept
Full Member
***
Offline Offline

Activity: 225
Merit: 101


View Profile
June 18, 2012, 02:29:00 PM
 #30

If the altchain has some place to put a scriptPubKey or an array of them in every block, the chain could be funded with network assurance contracts as proposed by Mike Hearn for the main chain when the subsidy gets lower than fees.

Like my posts?  Connect with me on LinkedIn and endorse my "Bitcoin" skill.
Decentralized, instant off-chain payments.
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1135


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
June 18, 2012, 02:51:55 PM
Merited by ABCbits (1)
 #31

If the idea works and becomes essential to the way most people use Bitcoin, all developers could easily strategize and decide that a future version of clients will only relay/accept blocks when their coinbase contains a valid merged mining record for the other chain's most recent valid block.  It might then be properly called a "meta chain" rather than an "alt chain".

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1011


View Profile
June 18, 2012, 05:19:18 PM
 #32

It might then be properly called a "meta chain" rather than an "alt chain".
That's much better terminology. Thank you.

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
FreeMoney
Legendary
*
Offline Offline

Activity: 1246
Merit: 1014


Strength in numbers


View Profile WWW
June 18, 2012, 05:35:18 PM
 #33


isn't it possible to merge mine the balance chain just like many pools do with namecoin?
that way there is not need to give any reward to miners. just include it in bitcoind as a "protocol requirement" for mining.

Yup.  The second chain doesn't need incentive due merged mining, unless it's particularly resource-consuming.  If the software is already written, and integrates into bitcoind or other mining software transparently, and doesn't impact mining performance, then miners would [likely] have no complaints about adopting it given the huge upside it offers.  It's basically free, from their perspective.

Though I don't agree it would be a "protocol requirement."  Miners have various reasons for doing what they do.  And this is being promoted as non-disruptive and optional.  But you don't need more than, probably 20% of mining power to engage this idea for it to be successful.  I'm not even sure what a 51% attack would look like on the alt-chain, but it wouldn't be very exciting -- the worst you could do is prevent some lite-nodes being able to verify their own balance -- but there would be no financial gain for it, and it would cost a fortune.  20% seems like a number high enough that there would always be a "checkpoint" nearby, and high enough that it would be vastly too expensive for a prankster to do anything to that chain.


Thanks, that makes sense.

Play Bitcoin Poker at sealswithclubs.eu. We're active and open to everyone.
theymos
Administrator
Legendary
*
Offline Offline

Activity: 5138
Merit: 12564


View Profile
June 18, 2012, 06:16:39 PM
 #34

Trustless lightweight-node support:  New nodes entering the network for the first time, will only have to download a tiny amount of data to get full, verifiable knowledge of their balance and how to spend it (much of which can be stored between loads).  A single honest peer out of thousands guarantees you get, and recognize, good data.

It doesn't seem trustless to me. Lightweight nodes (not storing all unspent outputs) can't know whether a block is valid, so they need to trust the majority of the network's mining power. This is no more secure than SPV, though possibly a little easier for lightweight nodes.

There is an advantage to "mostly-full" nodes who store all unspent transactions: they don't have to download all past blocks to start validating. But a regular Merkle tree of unspent outputs works just as well.

Using an alt chain instead of putting the Merkle root in a main-chain transaction seems unnecessarily complex.

1NXYoJ5xU91Jp83XfVMHwwTUyZFK64BoAD
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
June 18, 2012, 06:43:45 PM
Merited by ABCbits (1)
 #35

Trustless lightweight-node support:  New nodes entering the network for the first time, will only have to download a tiny amount of data to get full, verifiable knowledge of their balance and how to spend it (much of which can be stored between loads).  A single honest peer out of thousands guarantees you get, and recognize, good data.

It doesn't seem trustless to me. Lightweight nodes (not storing all unspent outputs) can't know whether a block is valid, so they need to trust the majority of the network's mining power. This is no more secure than SPV, though possibly a little easier for lightweight nodes.

This doesn't make sense at all.  The entirety of all Bitcoin trust relies on trusting "the majority of the network's mining power."  That is the security model of Bitcoin itself, and has the benefit that you only need one honest node out of one million to be able to distinguish truth amongst a sea of malicious peers.

The issue with SPV is that you have to trust either random peers to give you the correct/complete info, or connect to your own trusted system/subscription to get reliable information.  Without a subscription service, you have to query lots of peers and possibly use filtering/profiling games with peers to identify untrusted nodes, etc.   With this idea, you minimize the amount of information needed to be downloaded, and can compare directly against the headers -- you can get all the information you need from a single peer and know it's right (assuming you've already got the headers).

And that is actually just a side-benefit of it -- the original goal was blockchain compression, which this does near-optimally.

I agree that complexity is high.  But I disagree with the notion that you're not getting something for it.

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
theymos
Administrator
Legendary
*
Offline Offline

Activity: 5138
Merit: 12564


View Profile
June 18, 2012, 07:01:04 PM
 #36

This doesn't make sense at all.  The entirety of all Bitcoin trust relies on trusting "the majority of the network's mining power."  That is the security model of Bitcoin itself, and has the benefit that you only need one honest node out of one million to be able to distinguish truth amongst a sea of malicious peers.

If I have a copy of all unspent outputs, the majority of mining power can only change the ordering of transactions (and this ability might be more limited in the future). But a lightweight node will accept double-spends within the block chain, too-high block subsidies, invalid scripts, etc. if the attacker has enough mining power.

Quote from: etotheipi
The issue with SPV is that you have to trust either random peers to give you the correct/complete info, or connect to your own trusted system/subscription to get reliable information.

You don't have to trust random peers with SPV. SPV clients have the block headers and can use the Merkle roots to accurately get the number of confirmations for any transaction.

1NXYoJ5xU91Jp83XfVMHwwTUyZFK64BoAD
socrates1024
Full Member
***
Offline Offline

Activity: 126
Merit: 108


Andrew Miller


View Profile
June 18, 2012, 07:08:23 PM
 #37

If I have a copy of all unspent outputs, the majority of mining power can only change the ordering of transactions (and this ability might be more limited in the future). But a lightweight node will accept double-spends within the block chain, too-high block subsidies, invalid scripts, etc. if the attacker has enough mining power.

Notice that if the lightweight Client runs the Merkle tree scheme I described, it can rejected double-spends, invalid scripts, etc. Even without needing to store a copy of the unspent outputs.

Instead, the Client receives each block header, as well as O(M log N) of verification data (paths through the Merkle tree). This way, the Client can check each transaction against the database, as indicated by the current root hash. Only a full node acting as an untrusted helper needs to store the entire unspent outputs database, in order to generate the verification data.

amiller on freenode / 19G6VFcV1qZJxe3Swn28xz3F8gDKTznwEM
[my twitter] [research@umd]
I study Merkle trees, credit networks, and Byzantine Consensus algorithms.
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
June 18, 2012, 07:13:33 PM
 #38

Quote from: etotheipi
The issue with SPV is that you have to trust either random peers to give you the correct/complete info, or connect to your own trusted system/subscription to get reliable information.

You don't have to trust random peers with SPV. SPV clients have the block headers and can use the Merkle roots to accurately get the number of confirmations for any transaction.

Yes, but how do you know a particular TxOut hasn't been spent since it appeared in the blockchain?  I can verify, by downloading full transactions, that a particular TxOut was valid at one point in time, but I have no way to determine if it's been spent since that time.  Whether I'm verifying my own balance, or trying to confirm the validity of the inputs of another transaction, any malicious node can give me seemingly correct information, but still make my node incapable of operating -- perhaps because when I got online the first time an imported my wallet, I was given only old TxOuts that have since been spent, and my node will be stuck creating invalid transactions trying to spend them.

This is the basis of the overlay network that Electrum/Stratum uses.  I don't think there's anything wrong with that overlay network, other than it relies on trusted nodes being setup and available, or subscribing to a service, all of which is a degree of centralization (and extra user effort).  This chain avoids all of it, providing (at the expense of complexity), more reliable information without trusting anyone.  

If that was the only benefit, I'd agree it wouldn't be worth the hassle.  But as I said, that's a secondary benefit of this structure.  The main reason to do it is compression.

Which could be implemented with a simpler tree structure using an alt/meta-chain.  Or this tree-structure using a different distribution technique (protocol change, overlay network, trusting random peers, etc).

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
theymos
Administrator
Legendary
*
Offline Offline

Activity: 5138
Merit: 12564


View Profile
June 18, 2012, 07:32:02 PM
 #39

This is the basis of the overlay network that Electrum/Stratum uses.

Electrum and Stratum don't use SPV. Those clients don't keep block headers (AFAIK). BitcoinJ uses SPV.

Yes, but how do you know a particular TxOut hasn't been spent since it appeared in the blockchain?

I wait until it has 6 confirmations. SPV allows me to determine the number of confirmations accurately (assuming the majority of mining power is honest).

1NXYoJ5xU91Jp83XfVMHwwTUyZFK64BoAD
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
June 18, 2012, 09:25:36 PM
 #40

This is the basis of the overlay network that Electrum/Stratum uses.

Electrum and Stratum don't use SPV. Those clients don't keep block headers (AFAIK). BitcoinJ uses SPV.

Yes, but how do you know a particular TxOut hasn't been spent since it appeared in the blockchain?

I wait until it has 6 confirmations. SPV allows me to determine the number of confirmations accurately (assuming the majority of mining power is honest).

I'm not concerned about enough confirmations.  I'm actually concerned that it has 10,000 confirmations, and that it was actually spent 3,000 blocks ago, and that I have to search through 10,000 full blocks in order to know whether it's still a valid output.  Or, I can ask other nodes for help, but I how do I know to trust them? 

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
theymos
Administrator
Legendary
*
Offline Offline

Activity: 5138
Merit: 12564


View Profile
June 18, 2012, 10:38:31 PM
 #41

I'm not concerned about enough confirmations.  I'm actually concerned that it has 10,000 confirmations, and that it was actually spent 3,000 blocks ago, and that I have to search through 10,000 full blocks in order to know whether it's still a valid output.  Or, I can ask other nodes for help, but I how do I know to trust them? 

SPV nodes keep copies of all of their own transactions, so they should always know when outputs they own are spent. They don't need to query the network for this information.

1NXYoJ5xU91Jp83XfVMHwwTUyZFK64BoAD
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1011


View Profile
June 18, 2012, 10:45:07 PM
 #42

SPV nodes keep copies of all of their own transactions, so they should always know when outputs they own are spent. They don't need to query the network for this information.
What if you have the same wallet on two systems?

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
galambo
Sr. Member
****
Offline Offline

Activity: 966
Merit: 311



View Profile
June 18, 2012, 11:17:08 PM
Last edit: June 19, 2012, 03:55:44 AM by galambo
 #43

No clue about any of this stuff, but thank you guys for working on this.  I think it's important to have this sorted out before adoption and use gets really heavy.

I think its important that everybody understand this and I didn't see anyone explaining it in general terms.

The Blockchain is a distributed computer file system containing a double entry accounting ledger. Each transaction has two sides which you may be familiar with from accounting: input and output OR debits and credits. However, a major difference is that bitcoin forces a debit(input) to exist for every credit(output).  Storing all of this takes a lot of space. extra explaination

This proposal will continously "balance the books." In accounting, when you close out the books for the quarter all of the debits and credits are summed, and the difference between the two is entered as a "balance" transaction. Because we know that bitcoin forces every credit(output) to have a debit(input), we only have to keep track of all credits(outputs) that are not claimed by a debit(input) to obtain the balance of each address.

The proposal is for a system to store the references to these unspent outputs in a data structure for quick downloading. It doesn't suggest how this tree would be updated efficiently, or how you would quickly grab all of the unspent outputs belonging to one address. This is under discussion.
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
June 19, 2012, 01:30:45 AM
Merited by ABCbits (1)
 #44

I'm not concerned about enough confirmations.  I'm actually concerned that it has 10,000 confirmations, and that it was actually spent 3,000 blocks ago, and that I have to search through 10,000 full blocks in order to know whether it's still a valid output.  Or, I can ask other nodes for help, but I how do I know to trust them? 

SPV nodes keep copies of all of their own transactions, so they should always know when outputs they own are spent. They don't need to query the network for this information.

Sure they don't.  Until they do.  Maybe they haven't been online in a couple weeks and 2 GB of blockchain has happened since then.  Or maybe they want to check another address that they never owned to begin with.  Or maybe you just imported your wallet to another device (as Maaku said), or you're HDD failed and you are restoring from backup?   Are you now expected to become a nearly-full node to rescan the entire transaction history since your wallet was created 2 years ago? 

You can ask peers for it, and do fancy stuff to convince yourself the history you have been given is real and complete.  Or you avoid all of it and just get it from any peer you see on the network and not have to trust anyone other than majority hashing power.  Download it, verify it and you're done. 




Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1135


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
June 19, 2012, 04:43:37 AM
Last edit: June 19, 2012, 06:22:26 AM by casascius
 #45

OK, so I read that one of the problems making this tree work would be having to deal with recomputing it after each transaction.

What if the meta tree were not kept in a chain?  What if it were just a living thing that could be maintained by peers, and the only thing that would be kept in the chain is the merkle root of the meta tree put into the coinbase transaction?  A peer could request the whole tree from any peer who had it (assuming the peer "offered" the tree).

Further, to lower the overhead associated with updating the tree, let's say that a complete tree rebalance happens every 100 blocks or so.  Until then (i.e. unless blockheight % 100 == 0), leaf nodes would be removed from the tree by replacing them with a placeholder leaf node that would sort into the same place.  This way, one node can prove to another that no real records exist for that address by serving the placeholder node.

For security's sake, each peer offering the meta tree also ought to offer several versions (at least 2) of the meta tree as it looked when it was recreated at a 100-block checkpoint (more on that later).

So to illustrate this with an example...

I am a bitcoin client that comes to the network.  (This hypothetical network has evolved to the point where including the correct merkle root of the current meta tree in the coinbase is mandatory.)

First, I get all of the 80-byte headers going back to the genesis block and persuade myself that I have them all when I do.

Then, if I am interested in my own copy of the whole meta tree (for the good of the network), I ask a peer for it.  The peer sends it to me in its entirety.  I validate it against the merkle root I believe to be valid.

Instead of requesting the up-to-the-minute meta tree, I could request the meta tree 2 checkpoints ago (between 100-200 blocks ago) along with the corresponding full blocks with all the transactions, and play those transactions back.  This will help me be sure that I'm not looking at an orphan fork that completely replaced the meta tree with something else.

If I am NOT interested in my own copy of the whole meta tree, I give a peer a Bitcoin address and ask it to give me the leaf nodes surrounding that address in the sort order, along with the tree lineage to prove it's part of the tree.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
tevirk
Newbie
*
Offline Offline

Activity: 15
Merit: 0



View Profile
June 19, 2012, 06:20:35 AM
 #46

I don't see that even the complete tree of outstanding balances needs to be validated anyway.

To validate a transaction, I need answers to these questions:

  • What block (number and hash) is the latest?
  • In which block (number and hash) ancestral to block aaaaa is transaction xxxxx?
  • Is output n of transaction xxxxx spent in any block betweeen number bbbbb (with hash ppppp) and number ccccc (with hash qqqqq)?

A service can give signed answers to all those questions.  If it ever gives a wrong answer, it can checked and proved to be wrong by anyone with the complete blockchain.   It wouldn't be hard for a client to do random spot-checks on a service at a low rate -- again, if it ever gets a signed wrong answer it has proof for all to see that the service it used was unreliable.

Running a service like that is easy: all you need is enough cpu to keep up with the blockchain, which is trivial, and enough disk to store the blockchain and indexes -- currently about 5GB, say 1TB looking forward a few years.  Again, that is trivial.  The services could be run by ISPs, exchanges, online stores, enthusiasts, and could easily be ad-supported.  It could easily be made into an appliance that would cost only about USD200 at current prices, meaning any retailer using bitcoin regularly could have one of their own.  The client could check with a few different services: again, any wrong answer can be proved to be wrong by anyone with the blockchain.

That surely is the way forward in the long term.
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1135


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
June 19, 2012, 06:26:33 AM
 #47

I don't see that even the complete tree of outstanding balances needs to be validated anyway.
...

If it ever gives a wrong answer, it can checked and proved to be wrong by anyone with the complete blockchain.  

...

That surely is the way forward in the long term.


This thread is about a proposal to one-up that: to create a service whose wrong answers can be proven wrong instantly, automatically, and cryptographically without anything more than a single merkle root you are certain represents the latest state of the chain.  That's better than trusting a service where the only recourse is that you could rat them out if you manage to catch them lying, which you'll never be in a position to do unless you're also in a position to not need their services.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
Realpra
Hero Member
*****
Offline Offline

Activity: 815
Merit: 1000


View Profile
June 19, 2012, 06:34:18 AM
 #48

Once you cut down on the section of the blockchain you need to download while still verifying, you have essentially a swarm node.

I suggested something very similar in my thread about pruning not being enough.

However if you create a tx log instead of a merkle tree then you break compatibility, since the "root hash/log signature" will be different.


My proposal used heavily pruned merkle tree sections to send individual transactions without trust and had no forking.

You would check a new address really had funds by asking the network for the branches concerning just that address.

Even if the majority of nodes withheld information you would only need one honest node to get the complete picture in a short download.

Cheap and sexy Bitcoin card/hardware wallet, buy here:
http://BlochsTech.com
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1135


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
June 19, 2012, 06:50:30 AM
 #49

My proposal used heavily pruned merkle tree sections to send individual transactions without trust and had no forking.

You would check a new address really had funds by asking the network for the branches concerning just that address.

Even if the majority of nodes withheld information you would only need one honest node to get the complete picture in a short download.

How would that compare to maintaining a merkle tree with an enforced sort order: sorted by address?  Then the odds of accepting an answer from a lying node would be zero.  No need to seek a consensus.

If there were an enforced sort order, a response to a query could reliably prove he returned the entire set of records matching the query (or prove none existed), simply by providing a contiguous chunk of leaf nodes that completely contains the query, along with everything up to the merkle root.  Sort of how I could prove "Gordon Dinklebutt" is or isn't in the phone book by providing copies of contiguously numbered pages that include both "Dickson" (which is less than Dinklebutt) and "Dipweed" (which is greater), as well as everything in between.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1135


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
June 19, 2012, 07:04:22 AM
Last edit: June 19, 2012, 07:28:54 AM by casascius
 #50

If the idea of having a meta tree gained a following, then I'd almost like to throw out the following on top of it:  have two meta trees.

A first class meta tree and a second class meta tree.  The purpose of having two meta trees would be so that someone operating a P2P node can choose to offer resources to the network without having to offer unlimited resources to the network, and to have those resources give maximum benefit to the network.  Think: a user who would rather contribute to supporting real transactions instead of spam and bitdust.

The first class meta tree would be constrained in size, and would contain the most important txouts to the network.  "Important" being determined by value as well as age.  The goal would be that everything will end up in the first class tree except for bitdust.

The second class meta tree would contain everything, including the bitdust.

Users would be able to choose the level of resource commitment to the P2P network.  In today's terms, hosting the first tree could be a comfortable 100 megabyte commitment, and you would do it for selfish reasons: fast transaction processing and minimum transaction fees (explained later).  If the first tree had a size cap (perhaps that size is algorithmically pre-determined to follow Moore's law), people could give comfortably to the network.  

The second tree - which would contain everything that didn't fit in the first tree - could help force people putting the biggest burden on the network to pay the most.  Miners would have no choice but to host both trees, but miners are also in a position to force a more generous transaction fee for access to those transactions.

Bitdust would become second-class currency, but easily upgraded to first-class by spending and consolidating it, and waiting a bit.  Owners of bitdust would bear two burdens: a greater transaction fee, and a greater confirmation time when spent to someone who chooses to be agnostic of the second tree.

Someone running a client that only concerned itself with the first-class tree would learn about incoming bitdust transactions not immediately, but only after a miner mined the transaction spending the bitdust into a block, and then that block reached a threshold of anywhere between 6 and 100 confirmations to meet the criteria for it to be integrated into the first class tree.  As long as it was spent in a matter that consolidated it into a bigger transaction instead of leaving it in the form of dust, it would remain first class currency likely permanently.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
Realpra
Hero Member
*****
Offline Offline

Activity: 815
Merit: 1000


View Profile
June 19, 2012, 07:04:29 AM
 #51

My proposal used heavily pruned merkle tree sections to send individual transactions without trust and had no forking.

You would check a new address really had funds by asking the network for the branches concerning just that address.

Even if the majority of nodes withheld information you would only need one honest node to get the complete picture in a short download.

How would that compare to maintaining a merkle tree with an enforced sort order: sorted by address?  Then the odds of accepting an answer from a lying node would be zero.  No need to seek a consensus.

My solution did not seek consensus, as I said once you have the branches with transactions linked to the main hash you KNOW they are true.
Even if 90% of the network is withholding the "spent it all" transaction you would easily be able to get it from just ONE honest node.

How is the alternate merkle tree even safe with no/little mining? I could make a false log, sign it with minimal mining or put it in the blockchain (both easy) and fool you all right?

Quote
If there were an enforced sort order, a response to a query could reliably prove he returned the entire set of records matching the query (or prove none existed), simply by providing a contiguous chunk of leaf nodes that completely contains the query, along with everything up to the merkle root.  Sort of how I could prove "Gordon Dinklebutt" is or isn't in the phone book by providing copies of contiguously numbered pages that include both "Dickson" (which is less than Dinklebutt) and "Dipweed" (which is greater), as well as everything in between.
That only works if the phone book is complete - what if you are not sent the most up to date altchain/log-block? What if I send you false ones?

My solution had further advantages as NO ONE would need the full block - a big advantage as miners can't run lite nodes as it is.

My solution also solved propagation time problems blocks have today - this solution would not as it relies on the normal blockchain.

All of this without forking or merged mining.

Cheap and sexy Bitcoin card/hardware wallet, buy here:
http://BlochsTech.com
Realpra
Hero Member
*****
Offline Offline

Activity: 815
Merit: 1000


View Profile
June 19, 2012, 07:07:28 AM
 #52

If the idea of having a meta tree gained a following, then I'd almost like to throw out the following on top of it:  have two meta trees.
My solution already has this in it in by looking at branches.

Each node would be able to choose the depth of the merkle tree at which it wanted to verify.

All nodes could choose different points.

Cheap and sexy Bitcoin card/hardware wallet, buy here:
http://BlochsTech.com
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1135


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
June 19, 2012, 07:18:29 AM
 #53

My solution did not seek consensus, as I said once you have the branches with transactions linked to the main hash you KNOW they are true.
Even if 90% of the network is withholding the "spent it all" transaction you would easily be able to get it from just ONE honest node.

The problem is that you still have no certain way to know if a transaction is being withheld.  An attacker controlling your upstream connectivity (an attack that happens all the time in the real world) can easily engineer your view of the network such that you only see the nodes of the attacker's choice.  I am not sure many people will view that as acceptable.  This is very important, because if someone can withhold from you the knowledge that an incoming transaction is invalid because it's a double-spend, then you're vulnerable to double-spending attacks.

How is the alternate merkle tree even safe with no/little mining? I could make a false log, sign it with minimal mining or put it in the blockchain (both easy) and fool you all right?

By imposing a requirement that the merkle root be in the main chain's coinbase.  That essentially makes it "mandatory merged mining".

Such an idea might start out as a novelty, where the tree is maintained voluntarily and can't really be relied upon, but results in a huge improvement when you choose to rely upon it, albeit at a risk.  The developers may then say, "That improvement is great, let's eliminate the risk by making it mandatory to provide the correct merkle root of the meta tree(s) rather than optional, as a condition of a block to be accepted by the network."

That only works if the phone book is complete - what if you are not sent the most up to date altchain/log-block? What if I send you false ones?

A false meta tree would be outed by its root not matching what's in the block headers of the main chain.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
Maged
Legendary
*
Offline Offline

Activity: 1204
Merit: 1015


View Profile
June 19, 2012, 07:30:16 AM
 #54

An individual user client would likely use the tree 6 meta-blocks or more back to ensure that they are getting an accurate picture of things. Merchants would likely go back 100-1000 meta-blocks, and new miners would go back at least 10,000 meta-blocks,

As long as at least one person forever stores the entire blockchain (but maybe even not, if people trust that the meta chain was accurate for the past several years), those limits should provide plenty of warning and safety in case the meta chain gets 51% attacked.

casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1135


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
June 19, 2012, 07:36:57 AM
 #55

...and new miners would go back at least 10,000 meta-blocks...

Even if new miners chose not to do this, and successfully get tricked with a bogus view of the meta tree, their worst case is that they produce invalid blocks that become farts in the wind.  But I suppose mining pools would have a responsibility to do this.

As long as at least one person forever stores the entire blockchain (but maybe even not, if people trust that the meta chain was accurate for the past several years), those limits should provide plenty of warning and safety in case the meta chain gets 51% attacked.

And if the meta chain could be made an integral part of Bitcoin to the point that mining it was mandatory to mine Bitcoin, then the only way to 51% attack the meta chain would be to successfully 51% attack Bitcoin, which I would find comforting.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
Realpra
Hero Member
*****
Offline Offline

Activity: 815
Merit: 1000


View Profile
June 19, 2012, 07:37:05 AM
 #56

My solution did not seek consensus, as I said once you have the branches with transactions linked to the main hash you KNOW they are true.
Even if 90% of the network is withholding the "spent it all" transaction you would easily be able to get it from just ONE honest node.

The problem is that you still have no certain way to know if a transaction is being withheld.  An attacker controlling your upstream connectivity (an attack that happens all the time in the real world) can easily engineer your view of the network such that you only see the nodes of the attacker's choice.
He can't control you if you choose to connect to say the mtgox node or some other trusted* node via public key encryption.

The attacker would be unable to understand such secure messages - provided you have a known good public key to write to.

(* Not really trusted, just ANYONE who doesn't realize what txs the attacker want withheld.)

Quote
By imposing a requirement that the merkle root be in the main chain's coinbase.  That essentially makes it "mandatory merged mining".

So that is basically a fork right? My solution doesn't have that.

Anyway what if I am a miner and I include a signature of a incomplete log in my base?

The ONLY way to tell my log was incomplete would be to download the entire chain.

Quote
Such an idea might start out as a novelty
A swarm client would be instantly useful and has similar/same programmatic complexity as this.

Heck mining pools might be richly rewarded by adopting swarm clients. (by being able to have larger pools/more txs/fees in a block)

Quote
A false meta tree would be outed by its root not matching what's in the block headers of the main chain.
I was lucky enough to put it there as a miner, so the signature has been merged mined, all inside txs are valid - I just didn't include the one where I moved a bunch of coins.

Only way to oust that is checking the entire chain yourself.

I think with every miner motivated to do the above attack my solution is safer.

Cheap and sexy Bitcoin card/hardware wallet, buy here:
http://BlochsTech.com
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1135


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
June 19, 2012, 07:46:39 AM
 #57

He can't control you if you choose to connect to say the mtgox node or some other trusted node via public key encryption.

The attacker would be unable to understand such secure messages - provided you have a known good public key to write to.

I don't see that happening any time soon, as it would be opposite Bitcoin's design goals of decentralization.

So that is basically a fork right? My solution doesn't have that.

Right, instead your well-intended solution has a double spend vulnerability easily exploited by any upstream provider that can only be mitigated by connecting to a known centralized server that you think you can "trust" (the opposite of peer-to-peer).

Anyway what if I am a miner and I include a signature of a incomplete log in my base?

The ONLY way to tell my log was incomplete would be to download the entire chain.

Yep, you are right: so long as the meta chain were experimental and non-mandatory, anybody could throw anything they want in the coin base, including a completely falsified meta merkle root.  But consumers of the meta chain would depend on nothing that didn't have 6 confirmations (meta chain confirmations in the bitcoin block chain, not just 6 bitcoin blocks).  Your hash power would have to exceed that of those putting honest logs, essentially you would be attempting to attacking the meta chain and would need 51% of the meta chain's hash power to succeed.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
HostFat
Staff
Legendary
*
Offline Offline

Activity: 4200
Merit: 1202


I support freedom of choice


View Profile WWW
June 19, 2012, 07:51:25 AM
 #58

I'm don't understand completely all your proposals, but I stick with the one that works in all cases with this rule: "trust no one".

NON DO ASSISTENZA PRIVATA - http://hostfatmind.com
Realpra
Hero Member
*****
Offline Offline

Activity: 815
Merit: 1000


View Profile
June 19, 2012, 07:58:33 AM
 #59

He can't control you if you choose to connect to say the mtgox node or some other trusted node via public key encryption.

The attacker would be unable to understand such secure messages - provided you have a known good public key to write to.

I don't see that happening any time soon, as it would be opposite Bitcoin's design goals of decentralization.
I think you misunderstand, you don't need to connect to a TRUSTED node per say just ANY node that is not colluding with the attacker.

ANY will do. Could even be a different ATTACKER that didn't know what the FIRST attacker wanted hidden!

As for secure communication that is pretty standard, BTC should have it already if it doesn't.

Quote
So that is basically a fork right? My solution doesn't have that.

Right, instead your well-intended solution has a double spend vulnerability easily exploited by any upstream provider that can only be mitigated by connecting to a known centralized server that you think you can "trust" (the opposite of peer-to-peer).
Read above.

Quote
But consumers of the meta chain would depend on nothing that didn't have 6 confirmations (meta chain confirmations in the bitcoin block chain, not just 6 bitcoin blocks)
.
That would be DAYS in confirmation time in the beginning, who would use that to any great extent?

Quote
Your hash power would have to exceed that of those putting honest logs, essentially you would be attempting to attacking the meta chain and would need 51% of the meta chain's hash power to succeed.
You would be relying on SOMEONE checking that all those 6 logs are complete and then what? REPORTING it if not? Dumping the entire chain?
What miner would do that for an alt chain log?

As for reporting, yep you just arrived at part 1 of my solution, welcome.

Cheap and sexy Bitcoin card/hardware wallet, buy here:
http://BlochsTech.com
Realpra
Hero Member
*****
Offline Offline

Activity: 815
Merit: 1000


View Profile
June 19, 2012, 08:01:30 AM
 #60

I'm don't understand completely all your proposals, but I stick with the one that works in all cases with this rule: "trust no one".
My solution is basically trust that 1 guy out 1000 is honest - or run the client you are today with massive lag/huge fees.

Cheap and sexy Bitcoin card/hardware wallet, buy here:
http://BlochsTech.com
Maged
Legendary
*
Offline Offline

Activity: 1204
Merit: 1015


View Profile
June 19, 2012, 08:02:03 AM
 #61

...and new miners would go back at least 10,000 meta-blocks...

Even if new miners chose not to do this, and successfully get tricked with a bogus view of the meta tree, their worst case is that they produce invalid blocks that become farts in the wind.  But I suppose mining pools would have a responsibility to do this.
I would argue that as long as new miners that are bootstrapping at any given time are only a small % of the hash power, they'd be stupid not to verify that far back. Any misplaced trust in the recent meta-blocks could cause them to create an invalid block, which would be a terrible finical loss. In fact, for this reason, many miners will likely opt to always hold the entire chain, and not trust the meta-blocks at all. I consider that a good thing.

Generally, only users and merchants should be using the meta-chain to bootstrap, although I won't be that disappointed if miners eventually have to use it too, as long as they're careful.

casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1135


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
June 19, 2012, 08:06:52 AM
 #62

@realpra How will you connect to any node not controlled by an attacker if I the attacker control your upstream Internet and am redirecting your connection attempts to nodes I control?  You think you are connected to node X by its IP, but you are really connected to my node Y and have no way to know.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
Maged
Legendary
*
Offline Offline

Activity: 1204
Merit: 1015


View Profile
June 19, 2012, 08:08:04 AM
 #63

Quote
But consumers of the meta chain would depend on nothing that didn't have 6 confirmations (meta chain confirmations in the bitcoin block chain, not just 6 bitcoin blocks)
.
That would be DAYS in confirmation time in the beginning, who would use that to any great extent?
Many people. Imagine if all you had to do today to bootstrap was to download a week of blocks. A low-end laptop can do that in less than an hour today, and that's without all of the code optimizations the Bitcoin implementations will have in the future, not to mention hardware.

Realpra
Hero Member
*****
Offline Offline

Activity: 815
Merit: 1000


View Profile
June 19, 2012, 08:11:20 AM
Last edit: June 19, 2012, 08:26:14 AM by Realpra
 #64

@realpra How will you connect to any node not controlled by an attacker if I the attacker control your upstream Internet and am redirecting your connection attempts to nodes I control?  You think you are connected to node X by its IP, but you are really connected to my node Y and have no way to know.
Its called SSL I think.

Say I store the public keys for:
1. My friend Bob.
2. Guy who posted his key on a forum.
3. MtGox.

Since I am lazy that's it.

You send me invalid money and I check those nodes.

It now either becomes apparent that someone is blocking my connection OR one of them will likely NOT be colluding with you.

You can fake IPs but you have no way to fake that you have their private keys for my encrypted communication so my client will just display "You are under attack!!!".

Many people. Imagine if all you had to do today to bootstrap was to download a week of blocks. A low-end laptop can do that in less than an hour today, and that's without all of the code optimizations the Bitcoin implementations will have in the future, not to mention hardware.
I mean what customer/merchant would wait days to know whether payment was made or not?

edit: A swarm client would run on a smartphone and act as a full node btw.. Could even mine a bit in a pool.

Cheap and sexy Bitcoin card/hardware wallet, buy here:
http://BlochsTech.com
DiThi
Full Member
***
Offline Offline

Activity: 156
Merit: 100

Firstbits: 1dithi


View Profile
June 19, 2012, 03:26:32 PM
Last edit: June 19, 2012, 03:42:25 PM by DiThi
 #65

You are discussing two issues that IMHO it's already resolved in my proposal or a followup:

Efficient tree update: The update functions only recalculates the hashes affected by the changes of each block. Those changes can be reversed, as long as the block is valid (i.e. there's no double spends), therefore it will be easy to roll-back in case of getting orphaned blocks.

Where to save the roots: In my proposal I explain how to roll it out in the coinbase of the existing chain, but nullifying the risk of a chain split by rejecting blocks with invalid root only when there are more than 55% of valid roots in a specific time span.

For extra security (and this is what isn't originally in my proposal), the root should be accompanied by a hash of the previous+current valid roots, effectively making a secure blockchain from day one. But after it's deployed widely, it will be unnecessary, as we'll know miners will reject blocks with invalid roots. Miners won't reject blocks without roots. Blocks with valid root but without this blockchain-ish hash won't be rejected either (so we can drop this hash when it's not longer necessary).

In this way, creating a separate chain is just a temporal fix for something that will be in the main chain some day.

1DiThiTXZpNmmoGF2dTfSku3EWGsWHCjwt
galambo
Sr. Member
****
Offline Offline

Activity: 966
Merit: 311



View Profile
June 19, 2012, 03:44:42 PM
 #66

This idea could end up having more uses than enabling lightweight clients.

For instance, forking the main blockchain is practically impossible today. Even if someone came around with worthwhile changes to the storage subsystem or the scripting subsystem we could never implement it. The moment the two chains got out of sync you need two copies of the block chain .dat that are mostly identical.

With this proposal a certain "snapshot" in the metachain could be specified as the branch point for the blockchain. This snapshot could be used to refer back to the legacy system.

The proposal would allow experiments and tests using the real chain. The developers have been sort of paralyzed because they cannot change many things in the implementation because there's not really any way to change it.

If one of these experimental branches became popular enough, a new branch could be created on the official branch with ample notice to all users.

Also, having a chain of snapshots would allow the network to avoid new and unforeseen attacks. If one user managed to do something detrimental in the block chain to his advantage and every other user's disadvantage (like a sustained 51% attack, or an exploit), the community could achieve a consensus to "go back in time" to a previous snapshot with a patched client.
DiThi
Full Member
***
Offline Offline

Activity: 156
Merit: 100

Firstbits: 1dithi


View Profile
June 19, 2012, 04:03:50 PM
 #67

About rolling out new features and avoiding block chain splits, what we need is a good automatic system to automatically and democratically add any feature. Just like the implementation schedule of p2sh but being more like my proposal: time-flexible, with an additional temporal sub-chain, and for any feature. It may be difficult and problematic to code it only for one feature, but IMHO it's worth it if it's a generic implementation-deprecation system for determining the validity of blocks.

1DiThiTXZpNmmoGF2dTfSku3EWGsWHCjwt
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1135


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
June 19, 2012, 04:17:30 PM
 #68

Its called SSL I think.

I would be pretty surprised if nodes started identifying themselves through SSL certificates.

That said however, what it looks like you have proposed is tiers of nodes and a structure that includes supernodes.  I actually agree with you that such a structure will be critical to scalability of the network.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
June 19, 2012, 04:20:01 PM
 #69

You are discussing two issues that IMHO it's already resolved in my proposal or a followup:

Efficient tree update: The update functions only recalculates the hashes affected by the changes of each block. Those changes can be reversed, as long as the block is valid (i.e. there's no double spends), therefore it will be easy to roll-back in case of getting orphaned blocks.

Where to save the roots: In my proposal I explain how to roll it out in the coinbase of the existing chain, but nullifying the risk of a chain split by rejecting blocks with invalid root only when there are more than 55% of valid roots in a specific time span.

For extra security (and this is what isn't originally in my proposal), the root should be accompanied by a hash of the previous+current valid roots, effectively making a secure blockchain from day one. But after it's deployed widely, it will be unnecessary, as we'll know miners will reject blocks with invalid roots. Miners won't reject blocks without roots. Blocks with valid root but without this blockchain-ish hash won't be rejected either (so we can drop this hash when it's not longer necessary).

In this way, creating a separate chain is just a temporal fix for something that will be in the main chain some day.

DiThi,

I see this from a different angle.  

(1) The tree-part of my proposal should be seen as an extension of yours.  I'm sure my idea was inspired from reading yours a long time ago.  The difference being that extra complexity is added to the tree structure to accommodate the most common use-case:  requesting address balances.  My tree structure guarantees that you can not only get any TxOut, but you can get all TxOuts for a given address/script and have no doubts that it's correct.

I believe this is a worthy trade-off, comared to your tree structure, as it removes a channel of uncertainty for the operator, and removes a channel for shenanigans from those who wish to deceive you.  And in the end, it's not actually that much more complicated.  It's simply more-tailored for the way that users need to access the network.

(2)  As echoed by others, I believe that a hard-forking blockchain change is going to only happen in the event of a crisis.  To do so requires more than democracy -- it will seriously impact the entire network in a detrimental way.  There are users who are still using version 0.3.X bitcoin clients not because they want to, but because it works, and they don't follow the forums or Bitcoin news or anything of the sort.  And the hard fork exposes them to all sorts of malicious behavior by others who would exploit their ignorance of current events and manipulate the abandoned chain that they are stuck on.

To maintain confidence in the system, a hard fork is going to need more than democracy -- it's going to need super-majority, probably 80-90% ... and gaining that level of consensus is pretty much impossible for new ideas that are not well-understood -- unless the idea has been in the wild, and in use for many months/years and is already used by 80%+ people.

The idea of using a second blockchain is actually a way of creating a "staging area" for such ideas on the main network (like galambo said) without actually risking exposing that network to any of the unforeseen issues that could arise.  It can be used to add such functionality to the network without actually changing the network.

In this way, the meta-chain can grow and develop as people start using it and understanding it.  People start building infrastructure on the availability of the information in that chain.  Once it has become ubiquitous enough and time-tested as a pillar of a part of the network, then you have 80%+ agreement amongst users without even having to ask for it.  At this point, a hard-fork is entirely feasible -- or at least orders of magnitude less disruptive.

You're right, it's not the only way, but I think it's about as good as it's going to get.

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
DiThi
Full Member
***
Offline Offline

Activity: 156
Merit: 100

Firstbits: 1dithi


View Profile
June 19, 2012, 05:17:47 PM
 #70

My tree structure guarantees that you can not only get any TxOut, but you can get all TxOuts for a given address/script and have no doubts that it's correct.

I always saw that a separate issue/feature of my proposal (i.e. not necessary for starting and deploying an implementation), also making things simpler. Sometimes (actually most of the time) you just need to know an output hasn't been spent. If you need the balance and someone gives you a list of outputs, you can be sure those outputs are unspent; the only thing remaining is knowing if *all* the outputs are given to you.

That's easy to solve. I'm thinking of several solutions that doesn't require full nodes to build and verify the tree. For example having a separate tree, address-based instead of chain-based, which just stores the number of unspent outputs (removing the key if the value is 0).

Initially, though, we can just query several nodes to give us the count of unspent outputs and trust the majority.

1DiThiTXZpNmmoGF2dTfSku3EWGsWHCjwt
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
June 19, 2012, 05:26:32 PM
 #71

My tree structure guarantees that you can not only get any TxOut, but you can get all TxOuts for a given address/script and have no doubts that it's correct.

I always saw that a separate issue/feature of my proposal (i.e. not necessary for starting and deploying an implementation), also making things simpler. Sometimes (actually most of the time) you just need to know an output hasn't been spent. If you need the balance and someone gives you a list of outputs, you can be sure those outputs are unspent; the only thing remaining is knowing if *all* the outputs are given to you.

That's easy to solve. I'm thinking of several solutions that doesn't require full nodes to build and verify the tree. For example having a separate tree, address-based instead of chain-based, which just store the number of unspent outputs (removing the key if the value is 0).

Initially, though, we can just query several nodes to give us the count of unspent outputs and trust the majority.

Well that's where we are differing in opinion.  Majority peer-influence is cheap relative to majority mining power.  That's not to say it's an easy exploit, or that it would be in any way worth it.  But I see it as a source of uncertainty, and a channel waiting to be exploited in some way we haven't thought about.  I think the added complexity is well worth closing the "hole" completely.  Though not everyone feels it's actually a hole. 

I personally think it makes more sense, anyway -- you can still get a single TxOut with O(log(N)+log(M)) if you really want it -- but most of the time, it would be new nodes hopping on the network with imported wallets, and simply want to get their balance.  This tree structure takes that use case into account directly and doesn't leave a shred of uncertainty that they got the right answer.




Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
Realpra
Hero Member
*****
Offline Offline

Activity: 815
Merit: 1000


View Profile
June 19, 2012, 06:15:12 PM
 #72

Its called SSL I think.

I would be pretty surprised if nodes started identifying themselves through SSL certificates.
I would also be surprised if someones upstream connection was stolen Wink

At least with SSL known, it would only happen once before an SSL update was made.

Quote
That said however, what it looks like you have proposed is tiers of nodes and a structure that includes supernodes.  I actually agree with you that such a structure will be critical to scalability of the network.
No my structure theoretically could operate entirely with swarm clients.

However in the case of mining pools you might have one node orchestrating which hash will be worked on.

The guy of this thread has super nodes, I don't. Maybe you got confused that way.

I don't like super nodes, I think it's bad centralized design.

Quote
DiThi
Could you give me a link to your proposal?

Cheap and sexy Bitcoin card/hardware wallet, buy here:
http://BlochsTech.com
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1135


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
June 19, 2012, 07:01:46 PM
 #73

I would also be surprised if someones upstream connection was stolen Wink

Perhaps you are unfamiliar with how the Internet works in places like Iran and China, where not only do they do MITM attacks on their citizens, they coerce SSL certificate providers to issue them bogus certificates so their citizens will be caught unaware.

Bitcoin needs to work there, too.

http://www.bgr.com/2011/08/30/iranian-government-said-to-be-using-mitm-hack-to-spy-on-gmail-other-google-services/

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
unclescrooge
aka Raphy
Hero Member
*****
Offline Offline

Activity: 868
Merit: 1000


View Profile
June 19, 2012, 07:56:05 PM
 #74

About rolling out new features and avoiding block chain splits, what we need is a good automatic system to automatically and democratically add any feature. Just like the implementation schedule of p2sh but being more like my proposal: time-flexible, with an additional temporal sub-chain, and for any feature. It may be difficult and problematic to code it only for one feature, but IMHO it's worth it if it's a generic implementation-deprecation system for determining the validity of blocks.

Maybe I'm misunderstanding the change your talking about. But I think this is dangerous.I use bitcoin because i trust the protocol behind to never change. If the majority or the devs can push a change in protocol, then I'm out. A way to compress the blockchain fine. A fork, hard or soft... mmmmm seems dangerous to me.
galambo
Sr. Member
****
Offline Offline

Activity: 966
Merit: 311



View Profile
June 19, 2012, 10:02:10 PM
 #75


(2)  As echoed by others, I believe that a hard-forking blockchain change is going to only happen in the event of a crisis.  To do so requires more than democracy -- it will seriously impact the entire network in a detrimental way.  There are users who are still using version 0.3.X bitcoin clients not because they want to, but because it works, and they don't follow the forums or Bitcoin news or anything of the sort.  And the hard fork exposes them to all sorts of malicious behavior by others who would exploit their ignorance of current events and manipulate the abandoned chain that they are stuck on.

To maintain confidence in the system, a hard fork is going to need more than democracy -- it's going to need super-majority, probably 80-90% ... and gaining that level of consensus is pretty much impossible for new ideas that are not well-understood -- unless the idea has been in the wild, and in use for many months/years and is already used by 80%+ people.

The idea of using a second blockchain is actually a way of creating a "staging area" for such ideas on the main network (like galambo said) without actually risking exposing that network to any of the unforeseen issues that could arise.  It can be used to add such functionality to the network without actually changing the network.

In this way, the meta-chain can grow and develop as people start using it and understanding it.  People start building infrastructure on the availability of the information in that chain.  Once it has become ubiquitous enough and time-tested as a pillar of a part of the network, then you have 80%+ agreement amongst users without even having to ask for it.  At this point, a hard-fork is entirely feasible -- or at least orders of magnitude less disruptive.

You're right, it's not the only way, but I think it's about as good as it's going to get.

Thank you for your feedback. I wasn't quite sure if people would agree that this could help automate the BIP process.


Maybe I'm misunderstanding the change your talking about. But I think this is dangerous.I use bitcoin because i trust the protocol behind to never change. If the majority or the devs can push a change in protocol, then I'm out. A way to compress the blockchain fine. A fork, hard or soft... mmmmm seems dangerous to me.

The only way you would notice this kind of fork is if you applied the experimental patch to your Bitcoin client. Think about it like another "test net."
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1135


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
June 19, 2012, 10:26:00 PM
 #76

I actually disagree that a hard fork would be required to implement this.  A simple majority of mining power would be enough.  New blocks meeting the new requirements would still be valid blocks to the old clients, the only change being that the majority of miners would work to orphan blocks not containing the proper meta tree root, so miners mining with an old client would have an impossible time getting any blocks.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
cantor
Newbie
*
Offline Offline

Activity: 31
Merit: 0



View Profile
June 20, 2012, 01:46:43 AM
 #77

So as I understand it, the issue here is to build a meta-chain that would contain digests of the contents of the main blockchain, in a way that would allow a lite-client to query a server for information stored into the blockchain, and use the meta-chain to verify the answer.  And even the meta-chain itself would be constructed in such a way that it can be partially queried and verified using only its root hash, meaning the lite-client would only need a bound amount of storage.

I hope I'm doing a good summary of what is being discussed in this thread, considering that it's pretty late at night here Tongue  At any rate, subscribing, this looks really interesting.
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
June 20, 2012, 02:04:14 AM
Last edit: June 20, 2012, 02:26:27 AM by etotheipi
 #78

So as I understand it, the issue here is to build a meta-chain that would contain digests of the contents of the main blockchain, in a way that would allow a lite-client to query a server for information stored into the blockchain, and use the meta-chain to verify the answer.  And even the meta-chain itself would be constructed in such a way that it can be partially queried and verified using only its root hash, meaning the lite-client would only need a bound amount of storage.

I hope I'm doing a good summary of what is being discussed in this thread, considering that it's pretty late at night here Tongue  At any rate, subscribing, this looks really interesting.

Yeah, fairly accurate.  I'll re-summarize here because my view of my own proposal has evolved over discussions of the last few days, so I figured it was a good time to restate it, anyway Smiley


The first goal is blockchain pruning:  the amount of information needed to store the entire state of the network at any point in time is much less than the entire blockchain history.  You can basically just save the "outer surface" of the transaction map instead of the entire history and still do full validation.  

So I propose a structure that achieves this compression, and further organizes it accommodate a specific, common problem we want to solve anyway:  a new node gets on the network with its imported wallet and doesn't need the whole chain, but would like to get a complete history of it's own transactions in a verifiable manner.  I argue that with a more-straightforward "snapshot" tree, there's still room for deception by malicious peers, albeit not a whole lot.

Either way, I believe it's possible that new nodes can use a structure like this one to get up in running with full confidence in less than 50 MB of downloading, even in the far future.  And such a solution will be necessary, so let's hash it out now...

However, for lite-nodes to reliably use the new information, there must be some kind of enforcement that miners provide correct answers when updating the root node.  This could be done by hard-forking the network by changing the headers to require a valid root node, soft-forking by requiring a specific tx or coinbase script to contain a valid root, or as I propose: create a separate chain solely to allow mining power to "vote" on the correct snapshot root.  Such a solution would then be completely optional and transparent to anyone who doesn't know or care about it -- though I would expect most miners and developers would be anxious to leverage it.

As galambo brought up -- the alt-/meta-chain idea is a kind of "staging area" for this new piece of the protocol.  Once the community starts using it and becomes dependent on the information it provides, it could be integrated into the main chain (via hard- or soft-forking) as it would have super-majority support at that point.





Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
2112
Legendary
*
Offline Offline

Activity: 2128
Merit: 1060



View Profile
June 20, 2012, 02:12:34 AM
 #79

So as I understand it, the issue here is to build a meta-chain that would contain digests of the contents of the main blockchain,
This isn't a good summary.

The issues are:

1) saving local storage space by purging spent transaction info, but at the same time maintaining cryptographic verifiability of the stored info.

2) augmenting the p2p protocol such that in order to participate in network client doesn't have to start from the genesis all the way until now, but start at now (or close past) and go back in time only to the oldest coin dealt in the transaction, not all the way back to the genesis.

3) relaxing the original peer-to-peer protocol to allow at least partial parasite-to-peer operation, where parasite is a pretend-peer that doesn't fully verify the relayed information but just repeats the latest rumors really fast. The goal is to limit the possible damage caused by such network rumormongers.

My description probably isn't clearer but it is closer to the truth.

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0
doobadoo
Sr. Member
****
Offline Offline

Activity: 364
Merit: 250


View Profile
June 20, 2012, 02:20:00 AM
 #80

Can compression be used as an intermediate step along the way.  That is, is the blockchain stored on the disk in an efficient manner?  Also, why do noobs have to dl the whole block chain from peers?  Its soooo slow.  Couldn't each release come along with a zipped copy of the blockchain up to the date it was released, along with a hard-coded hash of that block chain up till then with a built in check.  That way the user can just dl the whole shebang in one swoop.

These of course are not permanent measures but maybe as interim fixes for now.

"It is, quite honestly, the biggest challenge to central banking since Andrew Jackson." -evoorhees
tevirk
Newbie
*
Offline Offline

Activity: 15
Merit: 0



View Profile
June 20, 2012, 05:23:04 AM
 #81

Because most of the data in the block chain is hashes, it's not compressible at all.
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1011


View Profile
June 20, 2012, 07:38:46 AM
 #82

Can compression be used as an intermediate step along the way.  That is, is the blockchain stored on the disk in an efficient manner?  Also, why do noobs have to dl the whole block chain from peers?  Its soooo slow.  Couldn't each release come along with a zipped copy of the blockchain up to the date it was released, along with a hard-coded hash of that block chain up till then with a built in check.  That way the user can just dl the whole shebang in one swoop.

These of course are not permanent measures but maybe as interim fixes for now.
Most of the sync time is spent doing ECDSA verification and disk I/O. Compression would actually make the problem worse. Packaging the block chain up to the latest checkpoint isn't a bad idea, but won't improve the situation as much as you probably think. The client would still have to verify the block chain, which means hours on end of 100% CPU utilization.

The real solution is to develop a way to avoid verification of historical data without compromising the security that verification provides. That's what this thread is about.

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
jojkaart
Member
**
Offline Offline

Activity: 97
Merit: 10


View Profile
June 20, 2012, 05:49:03 PM
 #83

About rolling out new features and avoiding block chain splits, what we need is a good automatic system to automatically and democratically add any feature. Just like the implementation schedule of p2sh but being more like my proposal: time-flexible, with an additional temporal sub-chain, and for any feature. It may be difficult and problematic to code it only for one feature, but IMHO it's worth it if it's a generic implementation-deprecation system for determining the validity of blocks.

Maybe I'm misunderstanding the change your talking about. But I think this is dangerous.I use bitcoin because i trust the protocol behind to never change. If the majority or the devs can push a change in protocol, then I'm out. A way to compress the blockchain fine. A fork, hard or soft... mmmmm seems dangerous to me.

There is at least one guaranteed hard fork that is going to happen eventually. Blocks are currently limited to maximum of 1 megabyte in size. This could start hampering further growth of Bitcoin usage starting sometime next year.

However, this doesn't mean Bitcoin devs can just push any change they want. Users and miners can always opt to not use whatever new version they put out.

In short, getting any change out needs a massive majority support from Bitcoin users to happen.
unclescrooge
aka Raphy
Hero Member
*****
Offline Offline

Activity: 868
Merit: 1000


View Profile
June 20, 2012, 06:06:14 PM
 #84

About rolling out new features and avoiding block chain splits, what we need is a good automatic system to automatically and democratically add any feature. Just like the implementation schedule of p2sh but being more like my proposal: time-flexible, with an additional temporal sub-chain, and for any feature. It may be difficult and problematic to code it only for one feature, but IMHO it's worth it if it's a generic implementation-deprecation system for determining the validity of blocks.

Maybe I'm misunderstanding the change your talking about. But I think this is dangerous.I use bitcoin because i trust the protocol behind to never change. If the majority or the devs can push a change in protocol, then I'm out. A way to compress the blockchain fine. A fork, hard or soft... mmmmm seems dangerous to me.

There is at least one guaranteed hard fork that is going to happen eventually. Blocks are currently limited to maximum of 1 megabyte in size. This could start hampering further growth of Bitcoin usage starting sometime next year.

However, this doesn't mean Bitcoin devs can just push any change they want. Users and miners can always opt to not use whatever new version they put out.

In short, getting any change out needs a massive majority support from Bitcoin users to happen.

Do we need a fork for block size? (sorry I don't know a bit about this)
jojkaart
Member
**
Offline Offline

Activity: 97
Merit: 10


View Profile
June 20, 2012, 07:06:04 PM
 #85

Do we need a fork for block size? (sorry I don't know a bit about this)

Yes,  it will be a fork because all current nodes on the network will ignore any block bigger than 1 megabyte. The best way to actually do this is to set a date (well, a block number in practise), say a year or two in the future and release a version that will keep rejecting blocks bigger than one megabyte until that block number is reached. After that it will accept blocks bigger than one megabyte.

Then, when a block that is bigger than one megabyte is created, the nodes with client versions after the update was made will accept the block and all the older versions will reject it. In practise, there's unlikely to be a real fork unless a significant portion of users refuse to upgrade.

- Joel
Mageant
Legendary
*
Offline Offline

Activity: 1145
Merit: 1001



View Profile WWW
June 20, 2012, 08:10:15 PM
Last edit: June 20, 2012, 08:30:57 PM by Mageant
 #86

You wouldn't have to update the meta-chain with every individual new block that comes out of the regular blockchain.

The lightweight client that uses the meta-chain could still store a small portion of the regular blockchain, something like the latest 100 blocks (max. 100 MB), alternatively this could be a variable of storage space.

Only updating the meta-chain every 100 blocks/100 MB or so would IMHO reduce the load on the meta-chain and the lightweight clients using it.

This also avoids any synchronization problems with the latest blocks out of the blockchain and the possibility of small forks (orphaned blocks) in the blockchain, since the meta-chain then would only synchronize with blocks of a highly confirmed number.

cjgames.com
jojkaart
Member
**
Offline Offline

Activity: 97
Merit: 10


View Profile
June 20, 2012, 08:33:24 PM
 #87

You wouldn't have to update the meta-chain with every individual new block that comes out of the regular blockchain.

The lightweight client that uses the meta-chain could still store a small portion of the regular blockchain, something like the latest 100 blocks (max. 100 MB), alternatively this could be a variable of storage space.

Only updating the meta-chain every 100 blocks/100 MB or so would IMHO reduce the load on the meta-chain and the lightweight clients using it.

This also avoids any sychronization problems with the latest blocks out of the blockchain and the possibility of small forks (orphaned blocks) in the blockchain, since the meta-chain then would only sychronize with blocks of a highly confirmed number.

I don't see how it would reduce the load any to only update the meta-chain every 100 blocks. It'd just concentrate the update load on certain blocks. The planned tree structure for this would allow O(M*log N) updates, where M is number of updated transactions and N is the total number of transactions in the tree.
Mageant
Legendary
*
Offline Offline

Activity: 1145
Merit: 1001



View Profile WWW
June 20, 2012, 08:48:37 PM
 #88

You wouldn't have to update the meta-chain with every individual new block that comes out of the regular blockchain.

The lightweight client that uses the meta-chain could still store a small portion of the regular blockchain, something like the latest 100 blocks (max. 100 MB), alternatively this could be a variable of storage space.

Only updating the meta-chain every 100 blocks/100 MB or so would IMHO reduce the load on the meta-chain and the lightweight clients using it.

This also avoids any sychronization problems with the latest blocks out of the blockchain and the possibility of small forks (orphaned blocks) in the blockchain, since the meta-chain then would only sychronize with blocks of a highly confirmed number.

I don't see how it would reduce the load any to only update the meta-chain every 100 blocks. It'd just concentrate the update load on certain blocks. The planned tree structure for this would allow O(M*log N) updates, where M is number of updated transactions and N is the total number of transactions in the tree.

Because certain outputs could have been already respent within the last 100 blocks, so you could cut out those.

Also there is the issue of avoiding blockchain forks that get orphaned, so the lightweight client would store a small portion of the regular blockchain anyway, say the last X blocks, where X is the number of blocks where you can be relatively sure that they won't get orphaned. I don't know if avoiding orphaned blocks is very important though, maybe it's not.

cjgames.com
apetersson
Hero Member
*****
Offline Offline

Activity: 668
Merit: 501



View Profile
June 20, 2012, 09:43:48 PM
 #89

your nice graphs hosted on dropbox are no longer working. maybe upload them to imgur.com ?
Realpra
Hero Member
*****
Offline Offline

Activity: 815
Merit: 1000


View Profile
June 20, 2012, 10:44:43 PM
 #90

The lightweight client that uses the meta-chain could still store a small portion of the regular blockchain, something like the latest 100 blocks (max. 100 MB), alternatively this could be a variable of storage space.

Only updating the meta-chain every 100 blocks/100 MB or so would IMHO reduce the load on the meta-chain and the lightweight clients using it.
Seems like a another flaw in this design to me.

At VISA volumes I think each block would be one gigabyte. Even if half or less it would break the light nodes proposed here.

This solution is patch work, a swarm client combined with the ledger system is the only way. It doesn't have to be my design, but we will benefit a lot from swarm principles at some point.

Cheap and sexy Bitcoin card/hardware wallet, buy here:
http://BlochsTech.com
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1135


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
June 20, 2012, 11:11:56 PM
 #91

I am the one who originally suggested the 100 block interval... but I don't think I said that updating the meta tree only every 100 blocks is what should be done.

Rather, the meta tree should be rebalanced every 100 blocks, and between then, nodes should be added and deleted using methodology that avoids (procrastinates) having to recalculate the hashes for most or all the nodes in the tree any time there was a change.  Otherwise, every incoming transaction will have a huge CPU time burden that's not sustainable.  Rebalancing the tree is much like rebuilding a database index.

The reason why 100 blocks, is that there needs to be an agreement that everybody will do it at a certain time simultaneously, so that as hashes of the tree are exchanged, they will always refer to the same data set.  An arbitrary number must be chosen that strikes a balance between the resource burden of rebalancing the tree (which favors a rebalance less frequently), versus the burden required of someone to get themselves up to speed (which favors a rebalance more frequently).

And while the meta tree may in fact be large, no node ever needs to accumulate older copies of the meta tree, so it is not as though having a 1GB meta tree is going to mean 1 GB for every 100 blocks.  There is value to having a few old revisions of the meta tree (e.g. so that a node can opt to formulate its view of the network starting a certain number of blocks back and protect itself from orphan blocks) but there is no reason, for example, anyone needs to accumulate this meta data for historical purposes, as it is completely reconstructable from the normal block chain.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
tevirk
Newbie
*
Offline Offline

Activity: 15
Merit: 0



View Profile
June 21, 2012, 06:10:25 AM
 #92

Sorry to be slow, but I don't see the gain here.  If a lightweight client is going to trust that a metablock that's been merged into the chain is truthful (because it's been built into a block), then it can just as reliably trust that a transaction that's in the chain a few blocks back is valid, because it's been built into a block.  There's no need for it to keep anything.  The only real advantage here seems to be that it saves miners from having to have a hard disk, and it seems like a lot of engineering to do that.

Quite possibly I'm missing something, in which case it would probably help for someone to step back and explain the aims and benefits.
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1011


View Profile
June 21, 2012, 08:00:53 AM
 #93

I am the one who originally suggested the 100 block interval... but I don't think I said that updating the meta tree only every 100 blocks is what should be done.

Rather, the meta tree should be rebalanced every 100 blocks, and between then, nodes should be added and deleted using methodology that avoids (procrastinates) having to recalculate the hashes for most or all the nodes in the tree any time there was a change.  Otherwise, every incoming transaction will have a huge CPU time burden that's not sustainable.  Rebalancing the tree is much like rebuilding a database index.
I'm not sure I follow. Updating any tree (balanced or not) is a constant-time operation. Updating any Merkle-tree (balanced or not) is log(N), although you can save a little effort by marking updated nodes as dirty and only updating Merkle hashes at the end. Rebuilding a database is N*log(N), a different beast. Anyway that's beside the point. Updating a balanced Merkle tree shouldn't be any more complex, algorithmically, than leaving it unbalanced. Unless I'm missing something; am I?

Sorry to be slow, but I don't see the gain here.  If a lightweight client is going to trust that a metablock that's been merged into the chain is truthful (because it's been built into a block), then it can just as reliably trust that a transaction that's in the chain a few blocks back is valid, because it's been built into a block.  There's no need for it to keep anything.  The only real advantage here seems to be that it saves miners from having to have a hard disk, and it seems like a lot of engineering to do that.

Quite possibly I'm missing something, in which case it would probably help for someone to step back and explain the aims and benefits.
Wrong problem. It saves the lightweight client from having to download, verify, and keep track of any of the block chain at all, except for those parts the user cares about (their own unspent outputs, for example).

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1135


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
June 21, 2012, 04:01:12 PM
 #94

Sorry to be slow, but I don't see the gain here.  If a lightweight client is going to trust that a metablock that's been merged into the chain is truthful (because it's been built into a block), then it can just as reliably trust that a transaction that's in the chain a few blocks back is valid, because it's been built into a block.  There's no need for it to keep anything.  The only real advantage here seems to be that it saves miners from having to have a hard disk, and it seems like a lot of engineering to do that.

Quite possibly I'm missing something, in which case it would probably help for someone to step back and explain the aims and benefits.

That works if, as a lightweight node, you plan only on receiving funds that have a very small number of confirmations, which eliminates your view of the majority of bitcoins that exist.  In USD terms, this would be like limiting yourself to only being able to accept crisp dollar bills that have never been handled more than once or twice.  More likely than not, you're going to need to be able to receive funds from anybody, which will have been confirmed anywhere on the block chain between the genesis block and now.  You either need the whole block chain to know whether a given incoming transaction is valid, or at least the digested tree of all unspent txouts for the entire block chain.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1135


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
June 21, 2012, 04:07:48 PM
 #95

I am the one who originally suggested the 100 block interval... but I don't think I said that updating the meta tree only every 100 blocks is what should be done.

Rather, the meta tree should be rebalanced every 100 blocks, and between then, nodes should be added and deleted using methodology that avoids (procrastinates) having to recalculate the hashes for most or all the nodes in the tree any time there was a change.  Otherwise, every incoming transaction will have a huge CPU time burden that's not sustainable.  Rebalancing the tree is much like rebuilding a database index.
I'm not sure I follow. Updating any tree (balanced or not) is a constant-time operation. Updating any Merkle-tree (balanced or not) is log(N), although you can save a little effort by marking updated nodes as dirty and only updating Merkle hashes at the end. Rebuilding a database is N*log(N), a different beast. Anyway that's beside the point. Updating a balanced Merkle tree shouldn't be any more complex, algorithmically, than leaving it unbalanced. Unless I'm missing something; am I?

My understanding is that removing leaf nodes from a sorted Merkle tree while maintaining the constraint that the tree remain sorted and balanced has the potential to cause the hash of every node to be recalculated.  Imagine going from a tree that has 513 nodes to one that has 512 nodes.  The tree will lose a whole rank, and 100% of its hashes will change.  That's an extreme case, but not far from the typical case: if you remove a leaf out of the middle and don't replace it with a placeholder, all the leaf nodes will shift left by one position to maintain the sort and balance constraints, and every parent of any node that has shifted will be recalculated.

The closer the removal is to the left side of the tree, the greater proportion of the tree must be recalc'd.  A recalc of a tree in the hundreds of MB or in the GB's for every incoming Bitcoin transaction would be unsustainably and unscalably expensive.  All transactions result in the spending, and therefore, a deletion of at least one leaf node, so this kind of update would be CPU-intensive for every user upon every roll of Satoshi's dice.  So my idea - to keep the tree view consistent and keep the updating to log(N) would be to only balance the tree on a predetermined interval, and at any point between, use placeholders and allow leaf nodes to become branches (both of which - especially the latter - would make the tree no longer a proper Merkle tree) to conserve CPU resources during updates.


Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
jojkaart
Member
**
Offline Offline

Activity: 97
Merit: 10


View Profile
June 21, 2012, 04:13:07 PM
 #96

My understanding is that removing leaf nodes from a sorted Merkle tree while maintaining the constraint that the tree remain sorted and balanced has the potential to cause the hash of every node to be recalculated.  Imagine going from a tree that has 513 nodes to one that has 512 nodes.  The tree will lose a whole rank.  That's an extreme case, but not far from the typical case: if you remove a leaf out of the middle and don't replace it with a placeholder, all the leaf nodes will shift left by one position to maintain the sort and balance constraints, and every parent of any node that has shifted will be recalculated.  The closer the removal is to the left side of the tree, the greater proportion of the tree must be recalc'd.  A recalc of a tree in the hundreds of MB or in the GB's for every incoming Bitcoin transaction would be overbearingly expensive.  All transactions result in the spending, and therefore, a deletion of at least one leaf node, so this kind of update would be CPU-intensive for every roll of Satoshi's dice.  So my idea - to keep the tree view consistent and keep the updating to log(N) would be to only balance the tree on a predetermined interval, and at any point between, use placeholders and allow leaf nodes to become branches to conserve updating resources.

Not quite true. For example, the red-black tree algorithm guarantees worst case operation of O(log N). Most branches, although they will move, will keep their hashes as they were, no need to recalculate.

- Joel
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1135


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
June 21, 2012, 04:16:19 PM
 #97


Not quite true. For example, the red-black tree algorithm guarantees worst case operation of O(log N). Most branches, although they will move, will keep their hashes as they were, no need to recalculate.

- Joel

The red-black tree is a concept I don't yet understand.  But if choosing this type of structure brings the benefit of O(log N) updates without introducing any negatives, then I'm all for it, and of course the periodic rebalance would become unnecessary.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
EnergyVampire
Full Member
***
Offline Offline

Activity: 210
Merit: 100



View Profile
June 21, 2012, 04:56:21 PM
Last edit: June 26, 2012, 11:30:55 PM by EnergyVampire
 #98

Subscribing

Added to Watchlist: https://bitcointalk.org/index.php?topic=90136.0

etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
June 21, 2012, 05:18:11 PM
 #99


Not quite true. For example, the red-black tree algorithm guarantees worst case operation of O(log N). Most branches, although they will move, will keep their hashes as they were, no need to recalculate.

- Joel

The red-black tree is a concept I don't yet understand.  But if choosing this type of structure brings the benefit of O(log N) updates without introducing any negatives, then I'm all for it, and of course the periodic rebalance would become unnecessary.

This is a topic I've been debating with folks on IRC the past couple days.  It's clear that most tree data strcutures have most of the properties we want.  Red-Black trees work great, but I don't like that the specific underlying structure (and thus root hash) depends on the specific order/history of insertions and deletions.  And assumes that every red-black implementation uses the rebalancing algorithm.  I have been given compelling reasons why this shouldn't be a problem, but I am personally not convinced yet.  Though I agree that it is probably an acceptable solution.

I'm voting for level-compressed trie structures (so, a variant of a patricia trees) which have no balance issues at all, are insert-order-independent, and O(1) query/insert/delete.  The problem is they can have a lot of storage overhead per tree element.  I haven't done the calculation to know for sure just how bad it is. 

Once I get out my next version of Armory, I will be diving into this a bit more, hopefully creating a proposal with much more specifics about tree structure, and the CONOPs (concept of operations) of the meta-chain.


Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1135


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
June 21, 2012, 05:30:18 PM
 #100

This is a topic I've been debating with folks on IRC the past couple days.  It's clear that most tree data strcutures have most of the properties we want.  Red-Black trees work great, but I don't like that the specific underlying structure (and thus root hash) depends on the specific order/history of insertions and deletions.  And assumes that every red-black implementation uses the rebalancing algorithm.  I have been given compelling reasons why this shouldn't be a problem, but I am personally not convinced yet.  Though I agree that it is probably an acceptable solution.

Regardless of the tree structure chosen, why not rebuild it every 100 blocks, just for the sole purpose of having a periodic way of deterministically regenerating the tree, and to avoid mandating a continuous dependency on all prior versions of the meta tree in order to be certain that one has the "correct" permutation of the meta tree?

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1011


View Profile
June 21, 2012, 05:44:05 PM
 #101

@etothepi, what about non-standard transactions? Including IP, P2SH and future contract formats. Not all outputs can be reduced to an address. We've been speaking loosely about a tree of “addresses” but it would really have to be a tree of output scripts, so it's not going to be possible to limit search-string length for the prefix trie.

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1135


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
June 21, 2012, 05:53:47 PM
 #102

@etothepi, what about non-standard transactions? Including IP, P2SH and future contract formats. Not all outputs can be reduced to an address. We've been speaking loosely about a tree of “addresses” but it would really have to be a tree of output scripts, so it's not going to be possible to limit search-string length for the prefix trie.

I would think that all that matters is there be a deterministic index that can be used to look it up.

P2SH has a hash.  IP, to the best of my knowledge, isn't a kind of transaction, but is just a way to produce a pubkey-based transaction (from which an address/hash can be derived).  Transaction formats yet to be invented could easily stipulate some way of being found, if a simple default of "first hash in the script, or hash of [first constant | all concatenated constants] in the script bigger than X bits, whichever comes first" didn't solve most or all cases with a single broad stroke.  (For example, if such a default didn't make sense for a future transaction type, that future transaction type could contain a field that says "My Search Key is X".)

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
vuce
Sr. Member
****
Offline Offline

Activity: 476
Merit: 250


View Profile
June 21, 2012, 05:56:35 PM
Last edit: June 21, 2012, 06:12:32 PM by vuce
 #103

I'm voting for level-compressed trie structures (so, a variant of a patricia trees) which have no balance issues at all, are insert-order-independent, and O(1) query/insert/delete.
Quote
To insert a string, we search the trie until we can make no further progress. At this point we either add a new outgoing edge labeled with all remaining characters in the input string, or if there is already an outgoing edge sharing a prefix with the remaining input string, we split it into two edges (the first labeled with the common prefix) and proceed.
So new insertions go to the leaves of the tree, I think this would make it insert dependent - just like any other tree. I'd suggest avl instead of r-b, since it has lower worst height.
CoinLab
Sr. Member
****
Offline Offline

Activity: 270
Merit: 250


1CoinLabF5Avpp5kor41ngn7prTFMMHFVc


View Profile WWW
June 21, 2012, 08:34:05 PM
 #104

This is a very interesting idea.  Excited to see how it develops.
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
June 21, 2012, 08:42:30 PM
 #105

@etothepi, what about non-standard transactions? Including IP, P2SH and future contract formats. Not all outputs can be reduced to an address. We've been speaking loosely about a tree of “addresses” but it would really have to be a tree of output scripts, so it's not going to be possible to limit search-string length for the prefix trie.

I was expecting that the hash of the TxOut script would be used so that all nodes are exactly 32-bytes.  You could argue that's exactly what most TxOut scripts already are:  hashes of longer data fields (such as using hash160s in place of public keys), but you have to make sure the search key is strictly bounded in size if you're using a trie of some sort.

@vuce
The structure of a trie has no dependence on insert order.  Given a set of data, there is only one trie that can hold it.  The same goes for Patricia tries (which are level-compressed tries).  And given that its query, insert and delete times are based strictly on key size (which will be bounded as above), there are no balance issues at all: it always takes exactly 32 "hops" to get from the root to the leaf you want, regardless of whether you are querying, inserting or deleting.  So given fixed key-length, all those operations are actually O(1).  

On the other hand, I was hoping for a structure that wasn't too complicated, and both RB trees and Patricia tries have complicated implementations (even though the concepts behind them are fairly simple).  But if we're going to have to go with something complicated, anyway (to limit worst-case speed and time performance), then I'd have to vote for Patricia trie or variant.  Not only is it O(1)... someone brought up the very good point that updates to the tree can mostly be parallelized.  That sounds like another good property of a tree that's going to have very high update rates...

I just gotta spend some time to figure out the space overhead for storing TxOuts.  If it's going to triple the overall disk space compared to other structures, it might be worth using one of the insert-order dependent trees.

What other data structures am I missing that could be considered?  I know B-trees would be a good choice if we are going with insert-order-dependent structure:  they are easy to keep balanced with fairly simple rules for balancing.

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
vuce
Sr. Member
****
Offline Offline

Activity: 476
Merit: 250


View Profile
June 21, 2012, 08:47:07 PM
Last edit: June 21, 2012, 09:22:18 PM by vuce
 #106

@vuce
The structure of a trie has no dependence on insert order.  Given a set of data, there is only one trie that can hold it.  The same goes for Patricia tries (which are level-compressed tries).  And given that its query, insert and delete times are based strictly on key size (which will be bounded as above), there are no balance issues at all: it always takes exactly 32 "hops" to get from the root to the leaf you want, regardless of whether you are querying, inserting or deleting.  So given fixed key-length, all those operations are actually O(1).  

I was citing the patricia trie wiki, where it's pretty obvious that new inserts are inserted as leaves of the tree, therefore making them insert order dependent. If you would direct me to a better explanation I would appreciate it.

nevermind, I misunderstood how it works  Embarrassed FWIW, I agree, I think this might be the best choice.
Quote
What other data structures am I missing that could be considered?  I know B-trees would be a good choice if we are going with insert-order-dependent structure:  they are easy to keep balanced with fairly simple rules for balancing.

AVL tree is the mother of balanced binary trees. They have the smallest "worst case height", so the fastest query, but a bit slower insert/delete than red-black trees. They are also very easy to implement.

2-3-4 tree might also be worth considering. Don't know if it's insert-order-independent or not but by a quick look it might be. Or maybe plain 2-3 tree, that one has data in leaves only so it looks kind of like a merkle tree, but does have quite a bit of overhead.
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1135


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
June 21, 2012, 08:55:14 PM
 #107

I would consider height to be a worthy thing to control, not so much for query speed, but because the nodes from leaf to ancestor might have to be transmitted across the network in response to a query.

Also as we discuss these tree types, I want to make sure we are not straying far from the definition of a merkle tree, to maintain the desirable property of being able to prove that key x is or is not in the tree by providing all of the hashes necessary to climb to the root. All of these nifty tree types that put data in the branches rather than the leaf nodes likely may not retain that important property.  I read about red black trees on Wikipedia and notice the data does not go in the leaf nodes and cannot clearly see how I could clip part of that tree and hand it to someone and they be able to trust my clipping via a merkle root.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
2112
Legendary
*
Offline Offline

Activity: 2128
Merit: 1060



View Profile
June 21, 2012, 09:28:52 PM
 #108

AVL tree is the mother of balanced binary trees. They have the smallest "worst case height", so the fastest query, but a bit slower insert/delete than red-black trees.
I just wanted to point out that query speed is pretty much immaterial. All it matters is the update complexity.

Integrating over the world population of Bitcoin clients the probability of any particular key being queried is almost 0, but the probability of any particular key being inserted/deleted is almost 1. This is pretty much the exact opposite of the assumptions made in all classic information storage and retrieval texts.

If you come up with a really good storage tree with low overhead for insert/delete but bad query performance you can easily fix it by maintaining a secondary index structure that facilitates fast query for keys that are locally interesting. That secondary structure may be different for each individual client and be dependent on the local querying behavior.

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0
2112
Legendary
*
Offline Offline

Activity: 2128
Merit: 1060



View Profile
June 21, 2012, 09:44:50 PM
 #109

I am the one who originally suggested the 100 block interval... but I don't think I said that updating the meta tree only every 100 blocks is what should be done.
Also I urge to seriously consider batch updating the primary storage structure. And keep the recently-heard-of updates in a separate storage area. This probably should be somehow similar to the generational garbage collection concept.

I would also urge to avoid using 100 but choose a divisor of 2016.

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1135


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
June 21, 2012, 09:47:25 PM
 #110

I am the one who originally suggested the 100 block interval... but I don't think I said that updating the meta tree only every 100 blocks is what should be done.
Also I urge to seriously consider batch updating the primary storage structure. And keep the recently-heard-of updates in a separate storage area. This probably should be somehow similar to the generational garbage collection concept.

I would also urge to avoid using 100 but choose a divisor of 2016.

If you mean a factor of 2016, how about 21. (21x96=2016) That's also a clean round factor of 21 million, as well as 210000 blocks between reward changes.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
2112
Legendary
*
Offline Offline

Activity: 2128
Merit: 1060



View Profile
June 21, 2012, 10:08:06 PM
 #111

If you mean a factor of 2016, how about 21. (21x96=2016) That's also a clean round factor of 21 million, as well as 210000 blocks between reward changes.
I think it is too early to make this decision. I just wanted to stress that the heaviest housekeeping updates should be phase-shifted with respect to the difficulty retarget. In other words the blocks just before and just after the retarget should involve only light housekeeping.

I haven't seen anyone doing any serious game-theoretic analysis of the possible splitting attacks on the global Bitcoin network during the retarget, but I just want to avoid creating additional headaches resulting from batch updates.

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1135


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
June 21, 2012, 10:20:30 PM
 #112

I just wanted to confirm that we understand that when the network "retargets", or the block reward halves, it isn't actually doing anything computationally intensive.  All that happens is that the client expects to see a different number in a field in future blocks as compared to past blocks.

Rebuilding a tree isn't too computationally intensive to have happen once every 3.5 hours (which is what 21 blocks would suggest - and is also a relatively fixed interval).  All that matters is that we know it is too computationally intensive to happen per transaction (which is multiple times per minute and tends to infinity).  I can't imagine picking this number is premature, as whether the magic number is 1, 3, 7, 21, or 420 blocks, all choices accomplish the goal of not pegging users CPU's to 100% chugging through this tree.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1011


View Profile
June 22, 2012, 01:14:11 AM
Merited by ABCbits (1)
 #113

Quote from: etotheipi
On the other hand, I was hoping for a structure that wasn't too complicated, and both RB trees and Patricia tries have complicated implementations (even though the concepts behind them are fairly simple).  But if we're going to have to go with something complicated, anyway (to limit worst-case speed and time performance), then I'd have to vote for Patricia trie or variant.  Not only is it O(1)... someone brought up the very good point that updates to the tree can mostly be parallelized.  That sounds like another good property of a tree that's going to have very high update rates...
The 2-3-4 tree (aka B-tree of order 4) is really the only one I would consider in this circumstance. RB-trees are actually a special representation of 2-3-4 trees, but the implementation choices in balancing an RB-tree don't exist for 2-3-4 trees (for a given 2-3-4 tree there can exist more than one RB-trees that “encodes” that structure, but not vice versa). A higher order B-tree would also work, but then you would be trading increase CPU time for decreased I/O, which doesn't fit this application.

That said, the ability to parallelize prefix/radix tries is a very good point. You might win me over to that side yet... but if self-balancing trees are to be used, the 2-3-4 tree has some clear benefits over others (KVL, RB, etc.).


To the other posts... why would you ever need to rebuild the tree? I don't understand the purpose. If you are using a self-balancing structure then it stays balanced “for free”. And under what circumstance would you have all the transaction data, but not an existing tree structure or the block chain from which you can order the updates?

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1135


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
June 22, 2012, 02:22:26 AM
 #114

If you are using a self-balancing structure then it stays balanced “for free”. And under what circumstance would you have all the transaction data, but not an existing tree structure or the block chain from which you can order the updates?

If you have a self-balancing structure, you may not have a merkle tree, which is what you'd want as a node so you can serve lookups to lite clients with no block chain that they can determine with 100% certainty whether you're telling the truth given just the merkle root.  What you have instead with all these neat tree ideas is an index to speed up database ops - which would be great if we're building a database engine - but the title of the OP is "trust-free lite nodes".  I am not sure that by enumerating all of these other wonderful tree structures that we are remembering we need the cryptographic properties that a merkle tree offers to accomplish the stated goal.

Assuming you had a copy of such a tree... the circumstance one would have possession of it, is one as a node would have acquired it from a peer as a complete and total substitute for the block chain (other than the block headers and perhaps a few days of recent blocks).  The whole point of this exercise is to ditch the storage requirement of spent transactions from the block chain for the purpose of scaling Bitcoin and dealing with the fact that the block chain will soon be too heavy to lift - not so much to shave milliseconds off index lookups.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
June 22, 2012, 02:46:27 AM
 #115

If you are using a self-balancing structure then it stays balanced “for free”. And under what circumstance would you have all the transaction data, but not an existing tree structure or the block chain from which you can order the updates?

If you have a self-balancing structure, you may not have a merkle tree, ...

Any of these can be made into an authentication data structure.  Each node, including non-leaf nodes may represent data, so you just append this node's data, to the hashes of the non-null children and hash that to get the current nodes value.  Then it's parent nodes do the same to agregate their children.

I haven't defined it rigorously, but it can be done.  One issue with a 256-way Patricia/radix tree is that each level will need the values of the other 255 children in order to verify each level.  Granted, it only matters at the top levels where there's a lot of branching, beyond levels 4 and 5 basically all other children pointers will be null.  But it goes along with why I'm hesitant to endorse a patricia tree: there might be a lot of data to transfer to very a node.



Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
2112
Legendary
*
Offline Offline

Activity: 2128
Merit: 1060



View Profile
June 22, 2012, 03:00:20 AM
 #116

why would you ever need to rebuild the tree?
Probably instead of "rebuilding the tree" the better phrase whould be "rehashing the nodes on the update path in the tree". The rehashing of many short strings would completely dominate the cost of the update (insertion/deletion). This is one of those things that the O() notation doesn't convey.

The benefits of a short-and-fat data structure over a tall-and-lean are four-fold:

1) less rehash overhead after update
2) less latency sensitivity when queried level-wise over the p2p network (at the expense of wasted bandwidth)
3) lower write amplification if this structure is stored locally in the block-storage device
4) ability to batch-update the nodes incurring single rehash overhead for multiple inserts/deletions.

Ultmately from the point of long-term code stability it would be better to choose a more generic structure instead of 2-3-4-tree. If we choose N-to-M-tree we could set the N and M values based on the experimentation now and possibly easily change them in the future if experience shows that our initial guess was not so good.

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0
tevirk
Newbie
*
Offline Offline

Activity: 15
Merit: 0



View Profile
June 22, 2012, 05:05:48 AM
 #117

Sorry to be slow, but I don't see the gain here.  If a lightweight client is going to trust that a metablock that's been merged into the chain is truthful (because it's been built into a block), then it can just as reliably trust that a transaction that's in the chain a few blocks back is valid, because it's been built into a block.


That works if, as a lightweight node, you plan only on receiving funds that have a very small number of confirmations, which eliminates your view of the majority of bitcoins that exist.  In USD terms, this would be like limiting yourself to only being able to accept crisp dollar bills that have never been handled more than once or twice.  More likely than not, you're going to need to be able to receive funds from anybody, which will have been confirmed anywhere on the block chain between the genesis block and now.  You either need the whole block chain to know whether a given incoming transaction is valid, or at least the digested tree of all unspent txouts for the entire block chain.

I'm not talking about the inputs to the transaction that pays me, I'm talking about the transaction that pays me itself. Fred posts a transaction  which he says pays me 100 bitcoins. If I have the digested tree, or the whole block chain, I can check the transaction is valid. If I don't, I can't. But either way, it's still unverified. I'm going to wait for confirmations. Once I have confirmations, then I know it's valid too, because if I'm trusting the miners not to include fictional unspent txout digests I might just as well trust them not to include invalid transactions.

The problem this mechanism solves - validating an unverified transaction in a lightweight node - doesn't look to me like a very important one.
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1135


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
June 22, 2012, 05:30:01 AM
 #118

I'm not talking about the inputs to the transaction that pays me, I'm talking about the transaction that pays me itself. Fred posts a transaction  which he says pays me 100 bitcoins. If I have the digested tree, or the whole block chain, I can check the transaction is valid. If I don't, I can't. But either way, it's still unverified. I'm going to wait for confirmations. Once I have confirmations, then I know it's valid too, because if I'm trusting the miners not to include fictional unspent txout digests I might just as well trust them not to include invalid transactions.

The problem this mechanism solves - validating an unverified transaction in a lightweight node - doesn't look to me like a very important one.


Sure, if you aren't mining, like spam, and don't see any value in reliably knowing you have received funds from Fred in under an hour, since you can't tell the difference between a bogus transaction from Fred with fake inputs and a real one.  You might be willing and able to wait 6 confirmations before deciding others have paid you, but others won't.

Under this proposal, miners can reliably mine using these trees and not the block chain.  If you think all miners will want to lug around a block chain whose size tends closer to infinity with each person who starts running a gambling bot, then sure, this isn't important.

Being able to validate a transaction instantly is important for spam prevention.  Nodes only relay valid transactions.  If you can't validate transactions, you have no choice but to blindly spew to your peers anything any of them sends.  You'll be a sitting duck for DoS attacks (since for every 1 message coming in you'll nominally send 7 out), and a whole network made of nodes like this would be easy to spam into oblivion.

Finally, this tree proposal isn't meant to RUN on a lightweight node.  It is meant to make a normal node be able to SERVE another lightweight node, at the same time not having to have the full unabridged block chain.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
tevirk
Newbie
*
Offline Offline

Activity: 15
Merit: 0



View Profile
June 22, 2012, 06:08:09 AM
 #119

I'm not talking about the inputs to the transaction that pays me, I'm talking about the transaction that pays me itself. Fred posts a transaction  which he says pays me 100 bitcoins. If I have the digested tree, or the whole block chain, I can check the transaction is valid. If I don't, I can't. But either way, it's still unverified. I'm going to wait for confirmations. Once I have confirmations, then I know it's valid too, because if I'm trusting the miners not to include fictional unspent txout digests I might just as well trust them not to include invalid transactions.

The problem this mechanism solves - validating an unverified transaction in a lightweight node - doesn't look to me like a very important one.


Sure, if you aren't mining, like spam, and don't see any value in reliably knowing you have received funds from Fred in under an hour, since you can't tell the difference between a bogus transaction from Fred with fake inputs and a real one.  You might be willing and able to wait 6 confirmations before deciding others have paid you, but others won't.

Under this proposal, miners can reliably mine using these trees and not the block chain.  If you think all miners will want to lug around a block chain whose size tends closer to infinity with each person who starts running a gambling bot, then sure, this isn't important.

Being able to validate a transaction instantly is important for spam prevention.  Nodes only relay valid transactions.  If you can't validate transactions, you have no choice but to blindly spew to your peers anything any of them sends.  You'll be a sitting duck for DoS attacks (since for every 1 message coming in you'll nominally send 7 out), and a whole network made of nodes like this would be easy to spam into oblivion.

Finally, this tree proposal isn't meant to RUN on a lightweight node.  It is meant to make a normal node be able to SERVE another lightweight node, at the same time not having to have the full unabridged block chain.


So the scenario in which this helps is where

(a) transaction volume is so high that even miners running fancy purpose-built mining rigs can't store the transaction history on a standard-issue 1TB hard drive
(b) every Tom, Dick and Harry runs a lightweight node which relays every single transaction on a P2P network.

Those two conditions contradict each other.  If transaction rate goes up that high (and I think it shouldn't, but that's an entirely different discussion), bandwidth becomes the limiting factor before storage space does. At that transaction rate, inevitably the bitcoin network evolves to a backbone of heavy nodes exchanging everything and lightweight clients which consume only data of interest to them. That's quite independent of how the history is handled.

As to unconfirmed transactions, are there really going to be that many people who will accept an unconfirmed transaction, but not be willing to trust anyone to validate it for them?
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1011


View Profile
June 22, 2012, 07:21:48 AM
 #120

Maybe this will help: the trouble is when Satoshi released Bitcoin 0.1 and the whitepaper, he came up with this idea for a version of the client that only kept block headers and skimmed for transactions it cares about. It's become known as SPV--Simple Payment Verification--or a “lightweight client” and has a weaker trust model than a full verifying client.

We are not discussing that in this thread. What's going on here is the creation of a full-featured “thick client” which doesn't require the entire block chain history and could conceivably be run even on low-memory and embedded devices as bitcoin scales up. You can have your cake and eat it too.

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
June 22, 2012, 12:08:58 PM
 #121

The number one problem that this solves (besides blockchain pruning) is one of trust.  When a new node gets on the network right now, there are two options:

(1) Run a full node, with full blockchain, ultimate security -- verify everything in the blockchain yourself
(2) Run a lightweight node, with reduced level of security -- trust that someone else gave you your own correct balance, and have no way to check whether transactions are valid, especially zero-conf tx

With this meta-chain in place, you can run (1) for a lot less disk space, and (2; lightweight nodes) can achieve the ultimate security model without needing to hold the full chain.  I can verifiably import my own wallet knowing nothing but the headers and verify it directly against the blockchain.  If someone gives me a zero-conf tx, I can check not only that its inputs exist, but that the inputs haven't been spent yet.  Zero-conf tx should not really be trusted, anyway... but at least you can verify whether it's even possible for the network to accept it, which what a full node does.


Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
Serith
Sr. Member
****
Offline Offline

Activity: 269
Merit: 250


View Profile
June 22, 2012, 04:46:33 PM
 #122

If someone gives me a zero-conf tx, I can check not only that its inputs exist, but that the inputs haven't been spent yet.  Zero-conf tx should not really be trusted, anyway... but at least you can verify whether it's even possible for the network to accept it, which what a full node does.
I posted an idea about how to make 0-conf tx safe. There was some discussion, and the idea wasn't killed so far, but there wasn't much of an interest either, so I am starting to believe that there is no need in 0-conf tx.
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1135


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
June 22, 2012, 04:49:16 PM
 #123

I posted an idea about how to make 0-conf tx safe. There was some discussion, and it wasn't killed so far, but there wasn't much of an interest either, so I am starting to believe that it's not something demanded.

I think when one of the core developers immediately points out how it can be used to reliably defraud somebody, that makes it pretty much DOA.  But I would propose that there is indeed demand for a reliable zero-confirmation transaction.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
June 22, 2012, 07:38:33 PM
 #124

I posted an idea about how to make 0-conf tx safe. There was some discussion, and it wasn't killed so far, but there wasn't much of an interest either, so I am starting to believe that it's not something demanded.

I think when one of the core developers immediately points out how it can be used to reliably defraud somebody, that makes it pretty much DOA.  But I would propose that there is indeed demand for a reliable zero-confirmation transaction.

I hate bringing up zero-conf tx in general, because there are so many issues with them that they are useful only in isolated cases, usually partial-trust situations.  I guess, this shouldn't be seen as a driving force for implementing this proposal, but more seen as an example of how much verifiable information can be obtained from the network with minimal download.  It's fast enough that a light node could obtain as much information about a zero-conf tx as a full-node, with just a few kB downloaded.  Regardless of how that information is used, it's a huge functional advantage from where we are currently.

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
Serith
Sr. Member
****
Offline Offline

Activity: 269
Merit: 250


View Profile
June 22, 2012, 11:43:28 PM
Last edit: June 22, 2012, 11:55:01 PM by Serith
 #125

I posted an idea about how to make 0-conf tx safe. There was some discussion, and it wasn't killed so far, but there wasn't much of an interest either, so I am starting to believe that it's not something demanded.

I think when one of the core developers immediately points out how it can be used to reliably defraud somebody, that makes it pretty much DOA.  But I would propose that there is indeed demand for a reliable zero-confirmation transaction.

I believe he was wrong, at least no one objected when I explained why it is not a problem, a basic double spend check from a pool completely solves it. And if you are agree with him could you elaborate your opinion here or in the thread? The good thing about the idea is that it has the same property as etotheipi's proposal of how non-disruptive it is. It doesn't require any changes in Bitcoin, just collaboration between 2 or 3 major pools.

I pay you normally without using this system, then use the pool-signed txn to pay myself in a doublespend.  It would make attacks very reliable.
A pool must check if there is any conflicting transaction already present and if true then refuse to sign the multi-signature transaction.
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1135


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
June 23, 2012, 12:33:42 AM
Last edit: June 23, 2012, 12:50:35 AM by casascius
 #126

Instead of saying your idea is wrong, maybe I can contribute to it.

One of the things gmaxwell pointed out was that mining pools may be around now but there is no guarantee they will be around later, in fact he thinks they probably won't. So depending on them as fundamental architecture is probably a bad idea all around.

But imagine miners (both solo and pools) included a IP:Port calling card in the coinbase of their block. The calling card would convey the message: I am a miner, you can contact me via udp directly at (ip:port), send me your transaction, and if it looks valid, I will give you a signed promise (signed by coinbase key) that I accept and plan to confirm this transaction.

One would know what percentage of the mining pool any given calling card represents just by the number of recent blocks containing it.

Someone wanting a miner commitment on a transaction would blast out that transaction via UDP to all of the miners whose calling cards appear in the last 1000 blocks.  That sounds extreme, but we're only talking a few hundred kilobytes total, with the total going to each miner being under 1-2KB.

By using UDP instead of TCP, one could blindly blast out a bunch of simultaneous requests into the internet on a "best effort" basis with low overhead, knowing most of them will arrive and many won't and that that's OK.  The responses would arrive the same way.

Either the sender or the recipient of a transaction could immediately contact thousands of miners with a blast of udp packets and rapidly get an accurate feel for how much mining power supporting the transaction has just by gauging the udp responses that come back within the following 10 seconds.

If it is a supermajority you have success.

Such could be the standard practice for accepting zero conf transactions.  It could be an excellent revenue generator for the mining community as a whole in the form of a for-pay service (example, all miners could stipulate that this UDP confirmation service is only available if transaction fee [in the transaction being zero-confirmed] is meets a far more generous criterion than the satoshi client minimum)

To address gmaxwell's rightfully placed fear that I could pay "you" and then use the premium service to pay the doublespend to myself... if getting zero-conf is a fee-based service paid by the payer, then "you" could demand, as a condition of giving me goods with zero conf, that I include a fee big enough to ensure that you can click a button in your client and enjoy the confirmation service yourself, prepaid.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
Serith
Sr. Member
****
Offline Offline

Activity: 269
Merit: 250


View Profile
June 23, 2012, 09:37:09 PM
 #127

Instead of saying your idea is wrong, maybe I can contribute to it.

One of the things gmaxwell pointed out was that mining pools may be around now but there is no guarantee they will be around later, in fact he thinks they probably won't. So depending on them as fundamental architecture is probably a bad idea all around.

But imagine miners (both solo and pools) included a IP:Port calling card in the coinbase of their block. The calling card would convey the message: I am a miner, you can contact me via udp directly at (ip:port), send me your transaction, and if it looks valid, I will give you a signed promise (signed by coinbase key) that I accept and plan to confirm this transaction.

One would know what percentage of the mining pool any given calling card represents just by the number of recent blocks containing it.

Someone wanting a miner commitment on a transaction would blast out that transaction via UDP to all of the miners whose calling cards appear in the last 1000 blocks.  That sounds extreme, but we're only talking a few hundred kilobytes total, with the total going to each miner being under 1-2KB.

By using UDP instead of TCP, one could blindly blast out a bunch of simultaneous requests into the internet on a "best effort" basis with low overhead, knowing most of them will arrive and many won't and that that's OK.  The responses would arrive the same way.

Either the sender or the recipient of a transaction could immediately contact thousands of miners with a blast of udp packets and rapidly get an accurate feel for how much mining power supporting the transaction has just by gauging the udp responses that come back within the following 10 seconds.

If it is a supermajority you have success.

Such could be the standard practice for accepting zero conf transactions.  It could be an excellent revenue generator for the mining community as a whole in the form of a for-pay service (example, all miners could stipulate that this UDP confirmation service is only available if transaction fee [in the transaction being zero-confirmed] is meets a far more generous criterion than the satoshi client minimum)

To address gmaxwell's rightfully placed fear that I could pay "you" and then use the premium service to pay the doublespend to myself... if getting zero-conf is a fee-based service paid by the payer, then "you" could demand, as a condition of giving me goods with zero conf, that I include a fee big enough to ensure that you can click a button in your client and enjoy the confirmation service yourself, prepaid.

I agree with you and the second part of gmaxwell's post, that my proposal brings more centralization, and that's not ideal. You are thinking about making 0-confirmation tnx accepted in decentralize or less centralized way, that would be great if it's possible. I see problems with your proposal, but I am not ready to discuss it yet.

I think we are derailing etotheipi's thread so I made a copy of the post in my thread.
eldentyrell
Donator
Legendary
*
Offline Offline

Activity: 980
Merit: 1004


felonious vagrancy, personified


View Profile WWW
June 24, 2012, 09:45:09 PM
Last edit: June 24, 2012, 10:43:45 PM by eldentyrell
 #128

Trustless lightweight-node support

It doesn't seem trustless to me. Lightweight nodes (not storing all unspent outputs) can't know whether a block is valid, so they need to trust the majority of the network's mining power. This is no more secure than SPV, though possibly a little easier for lightweight nodes.

This is a subtle but important point.  A while back I wrote a section about it on the bitcoin wiki.  The Satoshi client never uses "number of blocks deep" as a measure of confidence that a transaction is valid.  Depth is used only as a measure of the likelihood of another, longer chain branch emerging that omits the transaction.

A truly trustless thin client needs to be able to be able to verify a recent block's height (that there really are 180,000 blocks before this one and they obey the max-4x-difficulty-adjustment rule) rather than its depth (that there really were 6 blocks built on top of it -- also known as confirmations).

It's possible to do height verification probabilistically without 2GB of disk+download, but you need more than a tree-of-unspent-transactions to do it -- the block chain has to look more like an upside-down tree or DAG (most recent block is the root) giving you multiple hash-secured O(log n)-long paths from any block to the genesis block.  These "shortcut ancestor links" can be added without 51% agreement.  If each challenge consists of the thin client picking the log(n)-long path and the server replying with the block headers along that path, it doesn't take many challenges or much time/bandwidth/space to drive the probability-of-being-snookered down to something well below the probability of your hardware randomly failing.  Once you have block height verification you can be truly trustless -- or, at least as trustless as the Satoshi client.

The printing press heralded the end of the Dark Ages and made the Enlightenment possible, but it took another three centuries before any country managed to put freedom of the press beyond the reach of legislators.  So it may take a while before cryptocurrencies are free of the AML-NSA-KYC surveillance plague.
galambo
Sr. Member
****
Offline Offline

Activity: 966
Merit: 311



View Profile
June 24, 2012, 10:46:45 PM
 #129


This is a subtle but important point.  A while back I wrote a section about it on the bitcoin wiki.  The Satoshi client never uses "number of blocks deep" as a measure of confidence that a transaction is valid.

A truly trustless thin client needs to be able to be able to verify a recent block's height (that there really are 180,000 blocks before this one and they obey the max-difficulty adjustment rules) rather than its depth (that there really were 6 blocks built on top of it -- also known as confirmations).

It's possible to do height verification probabilistically without 2GB of disk+download, but you need more than a tree-of-unspent-transactions to do it -- the block chain has to look more like a tree giving you multiple hash-secured O(log n)-long paths from any block to the genesis block).  These "shortcut ancestor links" can be added without 51% agreement.

How will the lightweight node know the path hashes are incorrect and to reject that part of a block?

I can see how the thick clients can verify this, but an incorrect path would require all of the thick clients to reject the block if it contained a false path. I think this means another protocol decision point like P2SH.  Otherwise, we'd have ignorant thick clients forwarding malicously crafted "tree transactions" to the light clients.

This isn't meant to be a gotcha. I like your idea (a lot), but I'd like some clarification. I've only started looking into this part of Bitcoin and I'm not a very big data structures guy.

I also don't understand why you think these could be added without the "51%."
eldentyrell
Donator
Legendary
*
Offline Offline

Activity: 980
Merit: 1004


felonious vagrancy, personified


View Profile WWW
June 24, 2012, 10:55:29 PM
Last edit: June 24, 2012, 11:20:01 PM by eldentyrell
 #130

How will the lightweight node know the path hashes are incorrect and to reject that part of a block?

All of these "links" are hashes -- they're self-validating just like the previous-block (prevblock) hash that Bitcoin currently uses.  These are just "extra prevblock pointers", but they're in the coinbase instead of in the headers and they point to an arbitrary ancestor instead of the immediate predecessor.  I'll call these new pointers non-prevblock pointers to distinguish them from the pointers we already have.


I can see how the thick clients can verify this, but an incorrect path would require all of the thick clients to reject the block if it contained a false tree.

I definitely do not propose adding any new block validity criteria.  You're right: hostile miners can add blocks with broken non-prevblock pointers.


Otherwise, we'd have ignorant thick clients forwarding malicously crafted "tree transactions" to the light clients.

If a hostile miner gets a block with broken non-prevblock pointers into the chain, it will amount to -- at worst -- a DOS attack on thin clients until the next block is found by friendly miners.  As long as the hostile block is at the top of the chain, all thin clients will believe that all servers are lying to them.  But they won't be compromised -- they'll simply twiddle their thumbs saying "Nobody has been able to convince me of what reality ought to look like".

As soon as the next {non-prevblock-pointer}-aware miner finds a block, they will add a block which refrains from creating any new paths through the malicious block.  Thin clients resume operation as normal.  In effect, blocks with broken non-prevblock pointers get excluded from the "DAG-within-the-blockchain".

Thin clients which were connected before the malicious block arrived won't be affected.

There are ways to make the cost of the DOS attack above very large (like >99% hashpower) but they add complexity.  


I think this means another protocol decision point like P2SH.

Definitely not!  I believe the trustless thin client problem can be solved without another hardfork (or even miners-only fork).


I also don't understand why you think these could be added without the "51%."

Right, so, suppose I control 1% of the network hashpower.  That means I create 1% of the blocks.  If the radix of the non-prevblock-pointer tree is, say, 200 (i.e. each block can have 200 non-prevblock pointers) that means that my block will be reducing the average path length in the tree more rapidly than the other 99% of the network is adding new blocks.  So the average path length will gradually converge to log(height), even if only 1% of the miners are adding the new pointers -- it's just that each block they add has to include more than 100 new pointers.  Of course, the more miners you have the more rapidly we get to the ideal log(height) situation…

The printing press heralded the end of the Dark Ages and made the Enlightenment possible, but it took another three centuries before any country managed to put freedom of the press beyond the reach of legislators.  So it may take a while before cryptocurrencies are free of the AML-NSA-KYC surveillance plague.
eldentyrell
Donator
Legendary
*
Offline Offline

Activity: 980
Merit: 1004


felonious vagrancy, personified


View Profile WWW
June 24, 2012, 11:35:40 PM
 #131

If each challenge consists of the thin client picking the log(n)-long path and the server replying with the block headers along that path, it doesn't take many challenges or much time/bandwidth/space to drive the probability-of-being-snookered down to something well below the probability of your hardware randomly failing.

By the way, since these servers are full clients they could charge thin clients some tiny amount like 0.00000001 BTC per challenge, giving people an incentive to run full-chain clients on machines with big disks.

Thin clients should use whichever server they are able to contact that has failed the fewest challenges (an honest server will only fail challenges if there is a hostile block at the top of the chain).  Bad guys could run a server without a blockchain that just spews back junk answers, but they'd only get one challenge per thin client and then be promptly ignored -- not enough money to be worth the trouble.

The printing press heralded the end of the Dark Ages and made the Enlightenment possible, but it took another three centuries before any country managed to put freedom of the press beyond the reach of legislators.  So it may take a while before cryptocurrencies are free of the AML-NSA-KYC surveillance plague.
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1011


View Profile
June 25, 2012, 02:41:59 PM
 #132

Over in the altchain section I announced a crowd funding campaign for a demurrage currency. If we reach our goal, one of the things we will do is fully implement etothepi's proposal, either in Armory or the official client, and back-port the changes to Bitcoin.

Although relevant, I don't want to spam the forum, so this will be my only cross-post regarding it.

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
btharper
Sr. Member
****
Offline Offline

Activity: 389
Merit: 250



View Profile
June 26, 2012, 06:49:13 AM
 #133

Sub, and I think I'll have to reread everything already, I'm not sure there's one post with a full description anymore, just a lot of ideas that have been getting pieced together. Not to say that's bad, I just need to make sure I can parse out the current "best" proposal.
mp420
Hero Member
*****
Offline Offline

Activity: 501
Merit: 500


View Profile
June 27, 2012, 08:48:06 AM
 #134

I think this, and other related proposals, are the only ones around that are really taking scalability seriously. I haven't really gotten my head around the specifics yet, but the thing I really like about this proposal is that it requires no changes in the "Bitcoin Proper" at all.

I really hope this goes forward.
unclescrooge
aka Raphy
Hero Member
*****
Offline Offline

Activity: 868
Merit: 1000


View Profile
July 06, 2012, 12:53:30 PM
 #135

Hello,

Did the developers reach an agreement on how to prune the blockchain?

I didn't see much activity on the mailing list?
apetersson
Hero Member
*****
Offline Offline

Activity: 668
Merit: 501



View Profile
July 06, 2012, 03:01:21 PM
Last edit: July 06, 2012, 08:49:30 PM by apetersson
 #136

i think right now we need an experimental implementation to see how this approach would perform in practice.

IMO, the ideas outlined by etotheipi are the right way to go for a tiered bitcoin network, and for better scalability.
jgarzik
Legendary
*
qt
Offline Offline

Activity: 1596
Merit: 1091


View Profile
July 06, 2012, 03:45:28 PM
 #137

Did the developers reach an agreement on how to prune the blockchain?

sipa has been working on his "ultraprune" branch at github.  It is discussed on IRC.


Jeff Garzik, Bloq CEO, former bitcoin core dev team; opinions are my own.
Visit bloq.com / metronome.io
Donations / tip jar: 1BrufViLKnSWtuWGkryPsKsxonV2NQ7Tcj
apetersson
Hero Member
*****
Offline Offline

Activity: 668
Merit: 501



View Profile
July 06, 2012, 08:49:05 PM
 #138

sipa has been working on his "ultraprune" branch at github.  It is discussed on IRC.

the prunig efforts are nice and will lead to much faster desktop clients and full nodes, but they do not really solve the problem of very light nodes trying to obtain their bitcoin balance and transaction history from the network quickly.
ArticMine
Legendary
*
Offline Offline

Activity: 2282
Merit: 1050


Monero Core Team


View Profile
July 11, 2012, 10:55:24 PM
 #139

Before commenting on this thread I reviewed Satoshi Nakamoto's original paper: Bitcoin: A Peer-to-Peer Electronic Cash System, bitcoin.org/bitcoin.pdf, and I am left with two questions:

1) How is this proposal better or worse than 7. Reclaiming Disk Space in "Bitcoin: A Peer-to-Peer Electronic Cash System" with respect to overall blockchain size management?
2) How is this proposal better or worse than 8. Simplified Payment Verification in "Bitcoin: A Peer-to-Peer Electronic Cash System" with respect to verifying payments?

Concerned that blockchain bloat will lead to centralization? Storing less than 4 GB of data once required the budget of a superpower and a warehouse full of punched cards. https://upload.wikimedia.org/wikipedia/commons/8/87/IBM_card_storage.NARA.jpg https://en.wikipedia.org/wiki/Punched_card
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1135


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
July 11, 2012, 11:07:56 PM
 #140

Before commenting on this thread I reviewed Satoshi Nakamoto's original paper: Bitcoin: A Peer-to-Peer Electronic Cash System, bitcoin.org/bitcoin.pdf, and I am left with two questions:

1) How is this proposal better or worse than 7. Reclaiming Disk Space in "Bitcoin: A Peer-to-Peer Electronic Cash System" with respect to overall blockchain size management?

This works as a way to reclaim disk space provided you are starting with the whole block chain, but as presented, there is no way for one node to convey that stubbed tree to another node along with the assurance that only spent transactions have been removed.  If I run a node that prunes and stubs off a transaction showing I spent some coins, and then send you that pruned block, my spent coins look unspent to you.

Since it's a solution that's only useful to a node with the full block chain, and the real problem we face is more the downloading of the block chain rather than storing it, a solution that requires a full block chain download before anything can be safely pruned doesn't address the problem.

2) How is this proposal better or worse than 8. Simplified Payment Verification in "Bitcoin: A Peer-to-Peer Electronic Cash System" with respect to verifying payments?

That proposal suggests spending the funds and then watching to see if the rest of the network confirms the spend into a block before any useful verification is possible, or freshly receiving the funds while watching new blocks.  The idea discussed in this thread would allow instant verification of the existence of pre-existing funds without having to spend them first or downloading any blocks at all - and is actually not a different proposal, but the same proposal with significant improvements.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
ArticMine
Legendary
*
Offline Offline

Activity: 2282
Merit: 1050


Monero Core Team


View Profile
July 11, 2012, 11:31:18 PM
 #141

Before commenting on this thread I reviewed Satoshi Nakamoto's original paper: Bitcoin: A Peer-to-Peer Electronic Cash System, bitcoin.org/bitcoin.pdf, and I am left with two questions:

1) How is this proposal better or worse than 7. Reclaiming Disk Space in "Bitcoin: A Peer-to-Peer Electronic Cash System" with respect to overall blockchain size management?

This works as a way to reclaim disk space provided you are starting with the whole block chain, but as presented, there is no way for one node to convey that stubbed tree to another node along with the assurance that only spent transactions have been removed.  If I run a node that prunes and stubs off a transaction showing I spent some coins, and then send you that pruned block, my spent coins look unspent to you.

Since it's a solution that's only useful to a node with the full block chain, and the real problem we face is more the downloading of the block chain rather than storing it, a solution that requires a full block chain download before anything can be safely pruned doesn't address the problem.

2) How is this proposal better or worse than 8. Simplified Payment Verification in "Bitcoin: A Peer-to-Peer Electronic Cash System" with respect to verifying payments?

That proposal suggests spending the funds and then watching to see if the rest of the network confirms the spend into a block before any useful verification is possible, or freshly receiving the funds while watching new blocks.  The idea discussed in this thread would allow instant verification of the existence of pre-existing funds without having to spend them first or downloading any blocks at all - and is actually not a different proposal, but the same proposal with significant improvements.

Yes but under (1) what happens when you actually try to double spend the funds to me? I can still verify the double spend is in fact a double spend because I have the subsequent block hashes so what is the incentive to convey the block with the previous spend information removed?

Concerned that blockchain bloat will lead to centralization? Storing less than 4 GB of data once required the budget of a superpower and a warehouse full of punched cards. https://upload.wikimedia.org/wikipedia/commons/8/87/IBM_card_storage.NARA.jpg https://en.wikipedia.org/wiki/Punched_card
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1135


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
July 11, 2012, 11:49:37 PM
 #142

Yes but under (1) what happens when you actually try to double spend the funds to me? I can still verify the double spend is in fact a double spend because I have the subsequent block hashes so what is the incentive to convey the block with the previous spend information removed?

The subsequent block hashes don't tell you whether or not the funds are spent.  The only way you know funds are spent is that you know of a transaction that spends it.  There is no present way to know that certain funds are NOT spent unless you have the whole block chain coming after that transaction.

Remember, the goal is to eliminate a boundless multi-gigabyte download for new users.  The only way to solve that is to remove some information from that data set so it is smaller.  The party receiving the reduced data set can be assured that the data that's there actually belongs there, but he has no way to know whether the missing (pruned) data was actually supposed to be pruned.  This proposal addresses that.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
ArticMine
Legendary
*
Offline Offline

Activity: 2282
Merit: 1050


Monero Core Team


View Profile
July 12, 2012, 12:17:44 AM
 #143

Yes but under (1) what happens when you actually try to double spend the funds to me? I can still verify the double spend is in fact a double spend because I have the subsequent block hashes so what is the incentive to convey the block with the previous spend information removed?

The subsequent block hashes don't tell you whether or not the funds are spent.  The only way you know funds are spent is that you know of a transaction that spends it.  There is no present way to know that certain funds are NOT spent unless you have the whole block chain coming after that transaction.

Remember, the goal is to eliminate a boundless multi-gigabyte download for new users.  The only way to solve that is to remove some information from that data set so it is smaller.  The party receiving the reduced data set can be assured that the data that's there actually belongs there, but he has no way to know whether the missing (pruned) data was actually supposed to be pruned.  This proposal addresses that.

The fact that the correct data is pruned is secured on a second blockchain with merged mining. I can see the point of this for certain applications such as verifying the integrity of a physical Bitcoin without actually opening it up and spending the funds. So there is an advantage in that respect over the proposal in Bitcoin: A Peer-to-Peer Electronic Cash System.

Concerned that blockchain bloat will lead to centralization? Storing less than 4 GB of data once required the budget of a superpower and a warehouse full of punched cards. https://upload.wikimedia.org/wikipedia/commons/8/87/IBM_card_storage.NARA.jpg https://en.wikipedia.org/wiki/Punched_card
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1135


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
July 12, 2012, 12:24:48 AM
 #144

The fact that the correct data is pruned is secured on a second blockchain with merged mining. I can see the point of this for certain applications such as verifying the integrity of a physical Bitcoin without actually opening it up and spending the funds. So there is an advantage in that respect over the proposal in Bitcoin: A Peer-to-Peer Electronic Cash System.

It's also useful for instantly screening an incoming unconfirmed transaction as being "good pending confirmation" versus "totally bogus and no chance of confirmation" without needing a block chain at all.

It's also useful not just for physical bitcoins, but if people start printing disposable bitcoin cash from their printer. (example: user clicks File -> Print Money to "be their own bank" instead of driving to an ATM and paying an ATM fee - something I see as wildly compatible with the average joe.  such self-printed bills would have the same requirements as physical bitcoins). 

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
iain
Jr. Member
*
Offline Offline

Activity: 33
Merit: 7



View Profile WWW
July 12, 2012, 01:51:43 AM
Last edit: July 12, 2012, 02:18:07 AM by iain
 #145

Apologies if this is a stupid question, but... would it not also be of benefit to make available, by the same sort of methods, a record of all the spent txouts? I'm thinking of how things can go when a future light client is querying (in untrusting style) a full client node - perhaps in exchange for a micropayment, or a subscription or whatever - about whether various txouts are spent or unspent. Here's an anthropomorphized dialogue. (Obviously the number of rounds back and forth is higher than it needs to be, just for the sake of anthropomorphizing the exchange.)

Light client (LC): hi, is this txout (.....) spent or unspent?
Full client (FC): unspent.
LC: ...and your proof? (I don't trust you, remember!)
FC: here's the merkle chain [or similar thing for whatever the data structure is exactly] leading from your txout to the root hash, which you can acquire from the network with suitably impressive [merged-]mining effort associated therewith. (transmits a pleasantly short merkle chain)
LC: ok, thanks very much! now, what about this txout: spent or unspent?
FC: spent.
LC: ...and your proof?
FC: well, uh, the proof is the fact that I'm not sending you a merkle chain like I did with the unspent one!
LC: sorry, but that's not a proof from my point of view!

With a record of spent txouts, the FC can send a merkle chain convincing the untrusting LC of a txout's spent status, rather than having to say "trust my absence-of-proof-of-unspent to mean it's spent".

(Maybe this is overkill? The FC could just send the transaction which spends the given txout. But maybe that was a transaction that "died" (failed to be bedded down in the winning blockchain) long ago, and for the FC to send it is misleading. Admittedly, the main reason for such "dyings" is losing a double-spending battle, in which case the relevant txout is likely still in fact spent, just not by the transaction the FC is advertising. But one can imagine scenarios where the LC wants proof not just of spent status, but of where it went, and a suitable data structure could support a reply of "it's spent, and here's the tx it went into, and here's a pleasantly short merkle chain establishing this".)
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1135


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
July 12, 2012, 02:32:31 AM
 #146

It would be a bit different than that. Rather than saying "spent" it would say and prove "not unspent".

The way it would do this is by sending the leaf nodes immediately surrounding where the unspent tx would go IF it existed, along with Merkle lineage. This would be like me proving to you there are no "fockheads" in the phone book by sending you the pages immediately surrounding "fockhead" alphabetically (e.g. Freitag thru Foster). This assumes everything is kept in a sorted order, and that is what is proposed in this thread.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
ArticMine
Legendary
*
Offline Offline

Activity: 2282
Merit: 1050


Monero Core Team


View Profile
July 12, 2012, 03:57:25 AM
 #147


It's also useful for instantly screening an incoming unconfirmed transaction as being "good pending confirmation" versus "totally bogus and no chance of confirmation" without needing a block chain at all.

It's also useful not just for physical bitcoins, but if people start printing disposable bitcoin cash from their printer. (example: user clicks File -> Print Money to "be their own bank" instead of driving to an ATM and paying an ATM fee - something I see as wildly compatible with the average joe.  such self-printed bills would have the same requirements as physical bitcoins). 

Basically any situation where one needs to prove what the balance is in a particular bitcoin address without actually spending the coins in that address.

Concerned that blockchain bloat will lead to centralization? Storing less than 4 GB of data once required the budget of a superpower and a warehouse full of punched cards. https://upload.wikimedia.org/wikipedia/commons/8/87/IBM_card_storage.NARA.jpg https://en.wikipedia.org/wiki/Punched_card
iain
Jr. Member
*
Offline Offline

Activity: 33
Merit: 7



View Profile WWW
July 12, 2012, 07:15:11 AM
 #148

It would be a bit different than that. Rather than saying "spent" it would say and prove "not unspent".

The way it would do this is by sending the leaf nodes immediately surrounding where the unspent tx would go IF it existed, along with Merkle lineage. This would be like me proving to you there are no "fockheads" in the phone book by sending you the pages immediately surrounding "fockhead" alphabetically (e.g. Freitag thru Foster). This assumes everything is kept in a sorted order, and that is what is proposed in this thread.


Ah, of course! Thanks for the explanation.

Perhaps a record of spent txouts would still be of some value, to cover the case where a light client wants to query a full node with "prove to me how (i.e. into what tx) this txout was spent"? (But then again, maybe the need (if any) for such a query type would be sufficiently rare and specialized that running a full node oneself can be deemed the reasonable way to meet such an "expert" need.)
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1011


View Profile
July 12, 2012, 03:41:38 PM
 #149

Without being too snarky, we have such an index: it's called the block chain. A full node would send the transaction in which it was spent, the block header that transaction was included in, and the path through the transaction Merkle-tree linking the two.

Now in the far, distant future it might be useful to have an index of block hashes so that the lite node doesn't even have to keep track of that information, but right now the overhead of maintaining that tree vs the storage cost (about ~1.25MB per year) doesn't make sense.

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
iain
Jr. Member
*
Offline Offline

Activity: 33
Merit: 7



View Profile WWW
July 14, 2012, 06:51:05 PM
 #150

Without being too snarky, we have such an index: it's called the block chain. A full node would send the transaction in which it was spent, the block header that transaction was included in, and the path through the transaction Merkle-tree linking the two.

Now in the far, distant future it might be useful to have an index of block hashes so that the lite node doesn't even have to keep track of that information, but right now the overhead of maintaining that tree vs the storage cost (about ~1.25MB per year) doesn't make sense.

Yes indeed, you're quite right, my worry was groundless. Well then, this is really exciting, the road to a secure lightweight client is open and clear!
mp420
Hero Member
*****
Offline Offline

Activity: 501
Merit: 500


View Profile
July 26, 2012, 10:38:40 AM
 #151

I tried to read the OP again and I have a question. I apologize for the fact that I'm not very Bitcoin-literate and I may be asking things that are obvious.

When a lite node is trying to check the balance of a particular address, it needs to have downloaded the (pruned) alt-chain and every block of the main chain that have been included in the blockchain since the last alt-chain block.

Or is there a zero-trust way for a full node to assure the lite-node that a TxOut hasn't been spent since the last altchain block that does not require the lite node to ever download full primary-chain blocks?

Of course if we drop the zero-trust requirement, it's trivial to set up a third party service that does the validation for the lite clients up to the last alt-chain block.
Maged
Legendary
*
Offline Offline

Activity: 1204
Merit: 1015


View Profile
July 26, 2012, 12:10:04 PM
 #152

When a lite node is trying to check the balance of a particular address, it needs to have downloaded the (pruned) alt-chain and every block of the main chain that have been included in the blockchain since the last alt-chain block.
Pretty much. That being said, keep in mind that this would allow someone to become a full node without much trust while ONLY having to download that much.

maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1011


View Profile
July 26, 2012, 01:46:31 PM
 #153

When a lite node is trying to check the balance of a particular address, it needs to have downloaded the (pruned) alt-chain and every block of the main chain that have been included in the blockchain since the last alt-chain block.
Pretty much. That being said, keep in mind that this would allow someone to become a full node without much trust while ONLY having to download that much.
No, the lite client need only download the alt-chain headers since the last checkpoint, and then can request a path through the Merkle-tree to the unspent TxOut (or the surrounding outputs, if it has in fact been spent).

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1135


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
July 26, 2012, 04:24:06 PM
 #154

When a lite node is trying to check the balance of a particular address, it needs to have downloaded the (pruned) alt-chain and every block of the main chain that have been included in the blockchain since the last alt-chain block.
Pretty much. That being said, keep in mind that this would allow someone to become a full node without much trust while ONLY having to download that much.

When a lite node is trying to check the balance of a particular address, it needs to have downloaded the (pruned) alt-chain and every block of the main chain that have been included in the blockchain since the last alt-chain block.
Pretty much. That being said, keep in mind that this would allow someone to become a full node without much trust while ONLY having to download that much.
No, the lite client need only download the alt-chain headers since the last checkpoint, and then can request a path through the Merkle-tree to the unspent TxOut (or the surrounding outputs, if it has in fact been spent).

Between the above two conflicting answers, the bottom one is the one I consider to be the more correct one.

Neither answer is false, but as I understand it, the way the OP wants to structure the tree, it would be possible to both prove or disprove the existence of funds with a simple query to someone else, and trust it with nothing more than the hash of the latest block.  Trusting that latest hash would typically require downloading the block headers, but not the entire blocks themselves.  What Maged says you can do, you can do, but what Maaku says you can also do, I believe you can also do.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
allten
Sr. Member
****
Offline Offline

Activity: 455
Merit: 250


You Don't Bitcoin 'till You Mint Coin


View Profile WWW
July 27, 2012, 04:58:32 AM
 #155

Still trying to understand if pruning the block chain is a good idea. I need a little help understanding.
One of the premises of bitcoin is that we only need to trust math and cryptography. So, anyone can download
the entire block chain and verify all the signatures, hashes, etc. and verify that it 100% complies with the math, cryptography, and bitcoin's protocols; however,
is this still possible once transaction with spent outputs are removed? It would seem that you would have to start putting trust in the mining community that
there wasn't a mass conspiracy to give themselves bitcoins.

Sorry, I really want to understand the detailed technicals so please have patience with my lack of understanding.
I'll do my best to understand if someone will explain it.

I can definitely see pruned/compressed blockchains would be beneficial for many applications, but it worries me that it would be the norm for everyone.

Thanks to all working on what I also see as one of the more pressing issues with bitcoin right now.
socrates1024
Full Member
***
Offline Offline

Activity: 126
Merit: 108


Andrew Miller


View Profile
July 27, 2012, 05:44:41 AM
 #156

"Pruning the trees" is a red herring. Here is another way of explaining what this proposal is about.

Bitcoin already supports a Zero-Trust validation technique that is also 100% pruned - it only requires storing a single O(1) block hash. The problem is that it's blatantly inefficient. What we are doing with Merkle trees is altering the blockchain to make this technique, which is inherently fully pruned and zero-trust, more efficient.

[Inefficient O(N), Zero-trust, 100%-pruned technique] (It's just a counter example, bear with me) The entire bitcoin blockchain is already represented by the current block hash. Every client, even the lightest of light clients, stores the current block hash. If you know your current block hash, then you can't be fooled about any of the data in the chain. For example, if you want to check that a particular txoutput hasn't been spent, you iterate backwards through the blocks all the way to that transaction, checking that there's no previous transaction that spends it.

This isn't efficient. It literally involves processing the whole damn chain each time you validate a transaction. So, current light clients certainly don't do this - instead they have to trust a helper to validate that the transactions aren't double-spends. Full nodes have an easier time because they have the storage available to maintain an indexed database of unspent txouts.

[Merkle trees O(log N), Zero-trust, 100%-pruned] What we are all proposing in various ways is to alter how the blockchain history is represented in the current block hash. In addition to the previous block data, we will also include a 'snapshot' of the unspent txoutputs database in each block hash. This snapshot is the root of a Merkle tree, which gives us little shortcuts so we can validate a transaction much more efficiently, O(log N) per validation.

amiller on freenode / 19G6VFcV1qZJxe3Swn28xz3F8gDKTznwEM
[my twitter] [research@umd]
I study Merkle trees, credit networks, and Byzantine Consensus algorithms.
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1011


View Profile
July 27, 2012, 06:57:02 AM
 #157

Neither answer is false, but as I understand it, the way the OP wants to structure the tree, it would be possible to both prove or disprove the existence of funds with a simple query to someone else, and trust it with nothing more than the hash of the latest block.  Trusting that latest hash would typically require downloading the block headers, but not the entire blocks themselves.  What Maged says you can do, you can do, but what Maaku says you can also do, I believe you can also do.
Yes, I should have been more explicit. I interpreted the original question as “what is the minimum necessary action that needs to be taken by a client to verify the status of a TxOut”. Maged's solution is perfectly valid and would result in the correct answer, but would also be more work for the client than is strictly necessary.

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
mp420
Hero Member
*****
Offline Offline

Activity: 501
Merit: 500


View Profile
July 27, 2012, 01:02:37 PM
 #158

Thanks, I think I understand it now, the Merkle tree is arranged so that there's always a path from the latest hash to each unspent-TxOut, so only headers are required.

I think this idea is very clever and elegant. Of course the devil is in the implementation details. I'd very much like to see this implemented.
BrightAnarchist
Donator
Legendary
*
Offline Offline

Activity: 853
Merit: 1000



View Profile
July 27, 2012, 03:12:24 PM
 #159

I'd very much like to see this implemented.

Same here. So far more than 20 coins have been pledged  to the person or persons that implement this: https://bitcointalk.org/index.php?topic=93606 Hopefully more pledges soon...
allten
Sr. Member
****
Offline Offline

Activity: 455
Merit: 250


You Don't Bitcoin 'till You Mint Coin


View Profile WWW
July 27, 2012, 06:21:41 PM
 #160

Before I read this I just want to quickly post that I personally, no matter whether justifiably or unjustifiably, I personally feel like this is the most pressing issue when it comes to Bitcoin's successful future and I really hope the core team has an order of priorities planed accordingly.

I too believe this is a critical issue for Bitcoin, as a whole.  I had floated the idea that handling blockchain size was critical, in the past, but other issues seemed more pressing for the devs at the time -- I didn't have a solid idea to promote, and the blockchain size wasn't so out of hand yet.

One nice benefit of this solution is that because it's an alt-chain, technically no core devs have to be on-board.  It can be done completely indepedently and operate completely non-disruptively, even with only the support of other devs who believe in it.  I'd certainly like to get core devs interested in it, as they are very smart people who probably have a lot of good ideas to add.  But one of the biggest upsides here is that it can be done completely indepdently.

Read some more of your proposal today and I now have a better idea of how it works and what you are accomplishing. Good proposal BTW.
I'm still very concerned that this development will make it so the full blockchain database will not be as accessible for download.
Its important for Bitcoin that anyone can fully audit the blockchain and only need to trust in math and cryptography and not the miners (even though it would be an extremely low probability that enough miners would conspire to get coins in the compressed blockchain).

I completely agree that there are many applications and situations that a compressed blockchain would be extremely useful. Please update us on your perspective of the balance between systems that should have the full block chain and ones that should use the compressed version. The way I read the OP proposal and probably because of my fears is that this compressed form is to replace the full block chain in all instances.


jimbobway
Legendary
*
Offline Offline

Activity: 1304
Merit: 1014



View Profile
July 27, 2012, 09:48:06 PM
 #161

Before I read this I just want to quickly post that I personally, no matter whether justifiably or unjustifiably, I personally feel like this is the most pressing issue when it comes to Bitcoin's successful future and I really hope the core team has an order of priorities planed accordingly.

I too believe this is a critical issue for Bitcoin, as a whole.  I had floated the idea that handling blockchain size was critical, in the past, but other issues seemed more pressing for the devs at the time -- I didn't have a solid idea to promote, and the blockchain size wasn't so out of hand yet.

One nice benefit of this solution is that because it's an alt-chain, technically no core devs have to be on-board.  It can be done completely indepedently and operate completely non-disruptively, even with only the support of other devs who believe in it.  I'd certainly like to get core devs interested in it, as they are very smart people who probably have a lot of good ideas to add.  But one of the biggest upsides here is that it can be done completely indepdently.

Read some more of your proposal today and I now have a better idea of how it works and what you are accomplishing. Good proposal BTW.
I'm still very concerned that this development will make it so the full blockchain database will not be as accessible for download.
Its important for Bitcoin that anyone can fully audit the blockchain and only need to trust in math and cryptography and not the miners (even though it would be an extremely low probability that enough miners would conspire to get coins in the compressed blockchain).

I completely agree that there are many applications and situations that a compressed blockchain would be extremely useful. Please update us on your perspective of the balance between systems that should have the full block chain and ones that should use the compressed version. The way I read the OP proposal and probably because of my fears is that this compressed form is to replace the full block chain in all instances.




The transaction cost just needs to be increased so it serves as an incentive for miners to support the main block chain...I think.
ripper234
Legendary
*
Offline Offline

Activity: 1358
Merit: 1003


Ron Gross


View Profile WWW
July 28, 2012, 12:08:52 PM
 #162

Finally had the time to parse and understand the first message in this thread.

+1 for the initiative, it's a good addition to Bitcoin.

Does this information appear in the wiki somewhere?
Does anyone care to TL;DR the rest of the thread for me?

Is there a bounty jar for this? (We can use Booster.io to open one)

Please do not pm me, use ron@bitcoin.org.il instead
Mastercoin Executive Director
Co-founder of the Israeli Bitcoin Association
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
July 28, 2012, 02:28:24 PM
 #163

FYI, I have been too swamped with work-work and keeping Armory from breaking under the increased blockchain load to spend too much time on this specific proposal.  I have not lost interest by any means, I just need to catch my breath after some big deadlines.  Then I'll be taking some time off to work explicitly on some Bitcoin stuff... including this proposal.

I'm going to spend some time in the near future looking at space efficiency of a couple variants of the trie data-structure.  I'm not sure exactly how this theoretical datastructure can be merged with a disk-based DB engine (I imagine that what I have in mind is not used by an existing, acceptable DB engine) but maybe there's a way to make a hybrid.  This is a problem that still needs to be resolved before we can move forward with an implementation:  Once we agree on a datastructure, how do we use it but avoid re-inventing the wheel in terms of robust, scalable disk-based database engines?

The more I've been thinking about it, the more I have become convinced that a trie-like structure is necessary.  Note only are query and insert times are constant, the determinism of tree structure for a given set of tree nodes means that queries and inserts can be parallelized.  For instance, the tree could be implemented with the first layer (the first 256 nodes of a 256-way trie) split into a different files/DBs/processes/servers for each branch.  Then every new piece can be distributed and queued for its particular destination.  It could even be distributed amongst different, independent servers.  This seems to be advantageous.

For reference, I learned of this particular tree in my data structures class in college, but I find no reference that anyone else has ever heard of it.  In my class it was called a "De La Brandia Tree/Trie".  A Patricia tree is a level-compressed trie.  A "de la brandia tree" is a Patricia tree that uses a linked-list of pointers, instead of a constant-size array.  i.e. -- in a 256-way trie or patricia tree, each branch node contains 256 pointers (8 bytes each), which could point to other nodes.  However, the sparseness of lower-level nodes means that the nodes will frequently have 255 null pointers, with only one relevant pointer.  The de-la-brandia tree will represent all child pointers with an ordered linked list.  It has some extra overhead for nodes that have mostly-full child lists, but it I think such near-full nodes will be tiny compared to the amount of space saved on sparse nodes.

When I get some more time, I'll make some pictures, and update the original post with the updated proposal.  

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
Serith
Sr. Member
****
Offline Offline

Activity: 269
Merit: 250


View Profile
July 28, 2012, 03:39:46 PM
Last edit: July 28, 2012, 04:06:57 PM by Serith
 #164

For reference, I learned of this particular tree in my data structures class in college, but I find no reference that anyone else has ever heard of it.  In my class it was called a "De La Brandia Tree/Trie".

Did you notice that over the last 2-3 years google search gradually became really smart, it feels like there is a full scale AI behind it. I found the data structure that you were looking for, it's called "De la Briandais" tree.
jimbobway
Legendary
*
Offline Offline

Activity: 1304
Merit: 1014



View Profile
July 28, 2012, 04:01:59 PM
 #165

Not sure if this helps:

http://wiki.postgresql.org/wiki/IndexingXMLData#Patricia_Trie
http://stackoverflow.com/questions/355051/how-do-you-store-a-trie-in-a-relational-database

etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
July 28, 2012, 07:01:02 PM
 #166

For reference, I learned of this particular tree in my data structures class in college, but I find no reference that anyone else has ever heard of it.  In my class it was called a "De La Brandia Tree/Trie".

Did you notice that over the last 2-3 years google search gradually became really smart, it feels like there is a full scale AI behind it. I found the data structure that you were looking for, it's called "De la Briandais" tree.

Oh, if I had stopped typing so fast into google, I probably would've notice the autocompletion answer.  Interesting that I always thought it "brandia" instead of "brandais".  That's what I get for never going to class... (though I did stay up all night debugging the insert function for a Patricia tree).

Speaking of that, when I do a search for the correct name, I get a lot of links to the exact class I took when I attended UIUC:
http://www.cs.uiuc.edu/class/fa05/cs225/cs225/_notes/_section/cs225ta4/Documents/bsttries.pdf

It's not the most exhaustive introduction to the trie strcutures, but if you are already familiar with the concepts, you can get the gist of it.  And in fact, I was really proposing the Patricia/Brandais hybrid tree.  A pure "Brandais" tree uses linked lists but is not level-compressed.

The linked list may also make it easier to combine children to produce the "hash" of a particular node:  You only need to concatenate the non-null children hashes, which will frequently be very few elements (and the "skip string").  And in those cases, you just hash consecutively through the linked list.  And the dense nodes near the top can be cached for when they need to be recalculated.  This will also reduce the amount of data that needs to be transmitted to communicate a branch of the tree to a node (though, it's probably still more than I originally estimated). 

Unfortunately, my understanding of the correct path forward once these structures need to move from RAM to disk is beyond me. 

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
allten
Sr. Member
****
Offline Offline

Activity: 455
Merit: 250


You Don't Bitcoin 'till You Mint Coin


View Profile WWW
July 28, 2012, 08:03:06 PM
 #167

Before I read this I just want to quickly post that I personally, no matter whether justifiably or unjustifiably, I personally feel like this is the most pressing issue when it comes to Bitcoin's successful future and I really hope the core team has an order of priorities planed accordingly.

I too believe this is a critical issue for Bitcoin, as a whole.  I had floated the idea that handling blockchain size was critical, in the past, but other issues seemed more pressing for the devs at the time -- I didn't have a solid idea to promote, and the blockchain size wasn't so out of hand yet.

One nice benefit of this solution is that because it's an alt-chain, technically no core devs have to be on-board.  It can be done completely indepedently and operate completely non-disruptively, even with only the support of other devs who believe in it.  I'd certainly like to get core devs interested in it, as they are very smart people who probably have a lot of good ideas to add.  But one of the biggest upsides here is that it can be done completely indepdently.

Read some more of your proposal today and I now have a better idea of how it works and what you are accomplishing. Good proposal BTW.
I'm still very concerned that this development will make it so the full blockchain database will not be as accessible for download.
Its important for Bitcoin that anyone can fully audit the blockchain and only need to trust in math and cryptography and not the miners (even though it would be an extremely low probability that enough miners would conspire to get coins in the compressed blockchain).

I completely agree that there are many applications and situations that a compressed blockchain would be extremely useful. Please update us on your perspective of the balance between systems that should have the full block chain and ones that should use the compressed version. The way I read the OP proposal and probably because of my fears is that this compressed form is to replace the full block chain in all instances.




The transaction cost just needs to be increased so it serves as an incentive for miners to support the main block chain...I think.

yes, that is the correct answer. I'm just concerned that miners will be convinced that they are "supporting" the block chain with only the compressed/pruned version on their disk drive.
That is why I'm asking etotheipi to clarify how this will be presented to the miner community. That is, its a tool for the miners and everyone else, but the miners should still support the full blockchain.
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1135


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
July 28, 2012, 09:54:36 PM
 #168

yes, that is the correct answer. I'm just concerned that miners will be convinced that they are "supporting" the block chain with only the compressed/pruned version on their disk drive.
That is why I'm asking etotheipi to clarify how this will be presented to the miner community. That is, its a tool for the miners and everyone else, but the miners should still support the full blockchain.

The way I understand it, the miners will be supporting the block chain, regardless of whether they have part of it or all of it.

The only thing a miner is supporting at any given time is the hash of the latest block and the new transactions he is adding into his block.  Nothing more.  All of the history is covered simply by reference to the prior block hash.  Whether or not he has a local copy of the first billion rolls of Satoshi Dice on his hard drive is irrelevant toward his ability to mine.

I don't worry for one bit that the original unabridged block chain will ever go extinct.  Enough people care about it, the cost to maintain it is low, all it takes is one historian to seed it for the rest of everyone else and everyone who wants it will have it.

The way I see it, the only real critical reason one should demand full blocks all the way to point-in-time X is to maximize the probability that he is not being fed an attack fork without enough information to detect it.  A reasonable hunch of what a good value of X might be for the average client might be a week, and for a miner, a few months.  It could be argued someone investing in a serious mining operation (like a pool) arguably "needs" more assurance, but someone running a serious mining operation also likely has the skills to determine for himself whether he has the correct block chain and that assurance is arguably just as good as having more block history.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
ripper234
Legendary
*
Offline Offline

Activity: 1358
Merit: 1003


Ron Gross


View Profile WWW
July 29, 2012, 05:22:30 AM
 #169

FYI, I opened a bounty jar for this project at booster.io. Please add a link to the OP.

Quote
Ultimate Blockchain Pruning is a proposed alt-chain data structure that will enhance core Bitcoin scalability and allow for trust-free light clients. It does not compete with Bitcoin, but rather complements and strengthens it.

This bounty will be awarded to the first person or group who completes all these tasks:
1. Implement UBP
2. Get at least 15% of the hash power to merge-mine it
3. Patch at least one major Bitcoin client to support UBP mode
4. Benchmark the result and show an improvement of at least 10% in downloading the blockchain from scratch

This is quite an undertaking ... so you better donate if you want to encourage this idea.

Please do not pm me, use ron@bitcoin.org.il instead
Mastercoin Executive Director
Co-founder of the Israeli Bitcoin Association
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1011


View Profile
July 29, 2012, 06:30:56 AM
Last edit: July 29, 2012, 07:47:35 PM by maaku
 #170

FYI, I opened a bounty jar for this project at booster.io. Please add a link to the OP.
Sent 2.5BTC.

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
btharper
Sr. Member
****
Offline Offline

Activity: 389
Merit: 250



View Profile
July 29, 2012, 10:40:03 PM
 #171

Since the alt-chain can only update as often as the main chain, would using a different difficulty mechanism make sense? Of course adherer or not merged mining is used matters.

For example: Instead of saying that the first hash below difficulty wins, is there any way to say that the absolute lowest hash wins? The biggest issue I can see without an obvious answer is how to keep the chain from backpedaling if someone releases a better block N-1 while everyone else is working on block N , but it might just be a variant on the 51% attack.

TL;DR - would a different difficulty mechanism be warranted?
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1011


View Profile
July 29, 2012, 11:19:45 PM
 #172

No, it'll work as-is. The alt-chain mints blocks independently of the main-chain (excepting merged-mined blocks where they are both minted at the same time).

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
btharper
Sr. Member
****
Offline Offline

Activity: 389
Merit: 250



View Profile
July 30, 2012, 01:04:28 AM
Last edit: July 30, 2012, 03:34:33 AM by btharper
 #173

No, it'll work as-is. The alt-chain mints blocks independently of the main-chain (excepting merged-mined blocks where they are both minted at the same time).
Everything would work as is, but this chain wouldn't work quite the same I don't think. Since this alt-chain essentially needs to have the same number of blocks as the primary bitcoin chain. If the alt-chain catches up there's no incentive, or value, in mining anything else on it, which otherwise "wastes" the workers that are participating on the chain.

My main point is that linking the block count to the main chain one-to-one changes things somewhat. Or is this not as big of an issue as I'm thinking it is?

Edit: Fixed typos made while using my phone.
gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 4158
Merit: 8343



View Profile WWW
July 30, 2012, 01:54:13 AM
 #174

The alt-chain mints blocks independently of the main-chain

An alt-"chain" is probably the wrong way to think of this.  It's committed data. There doesn't need to be a chain.  Each bitcoin block could have 0 or 1 tree commitments of a given type.
btharper
Sr. Member
****
Offline Offline

Activity: 389
Merit: 250



View Profile
July 30, 2012, 03:44:31 AM
 #175

The alt-chain mints blocks independently of the main-chain

An alt-"chain" is probably the wrong way to think of this.  It's committed data. There doesn't need to be a chain.  Each bitcoin block could have 0 or 1 tree commitments of a given type.

The chain has established itself as a good proof-of-work system, which is the largest reason to stick with it that I can see right off hand. However it may run more like P2Pool in that storing old blocks (other than headers) may be slightly useless, or at least contrary to the goal of eliminating extra data. Setting up the alt-chain with either "none" or "some" updates may be the way to go to preserve simplicity however compared to normal chains. For one of the datastructure guys, would there be an advantage in marking updates as "a few" vs "many" in terms of packing the data? Maybe something else worth looking into for someone who knows how the current setup would work.

As a much more random aside, any idea what the alt-chain coins could be used for? and if transfer would be worthwhile. If transfer doesn't matter just give 1 coin per block and let people sign with their keys how many they've accumulated helping to secure the chain for lite nodes.  Smiley
DiThi
Full Member
***
Offline Offline

Activity: 156
Merit: 100

Firstbits: 1dithi


View Profile
July 30, 2012, 12:03:57 PM
 #176

The "alt-chain" term confuses people. It's more like a "sub-chain" of the main blockchain. It doesn't generate any coins at all, and it's a temporal fix until the majority of miners support it.

1DiThiTXZpNmmoGF2dTfSku3EWGsWHCjwt
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1135


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
July 30, 2012, 04:07:18 PM
 #177

The alt-chain mints blocks independently of the main-chain

An alt-"chain" is probably the wrong way to think of this.  It's committed data. There doesn't need to be a chain.  Each bitcoin block could have 0 or 1 tree commitments of a given type.

+100

The term that makes the most sense for me is a "meta-tree".  It would never be "mined" - it would simply be committed to in normal blocks - optionally - at least until it is proven to work in practice and a decision is made to make it a mandatory part of the protocol.


Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 4158
Merit: 8343



View Profile WWW
July 31, 2012, 03:44:42 PM
 #178

One recent revelation I've had as a result of Pieter's ultraprune implementation is that any tree commitment scheme should also commit to undo logs so that nodes don't necessarily have to all individually store all the data required to reorg forever.
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
July 31, 2012, 06:10:57 PM
 #179

One recent revelation I've had as a result of Pieter's ultraprune implementation is that any tree commitment scheme should also commit to undo logs so that nodes don't necessarily have to all individually store all the data required to reorg forever.

Perhaps that's another reason to use insert-order-invariant tree structure.  If you have to reverse some transactions, you don't have to worry how you undo them.  Undoing a block/tx is just as easy as adding it in the first place.

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 4158
Merit: 8343



View Profile WWW
July 31, 2012, 08:28:12 PM
 #180

Perhaps that's another reason to use insert-order-invariant tree structure.  If you have to reverse some transactions, you don't have to worry how you undo them.  Undoing a block/tx is just as easy as adding it in the first place.

You still have to have the complete data that you would remove. E.g. when I spend txn X,  I don't specify all of X's data to spend it (thats burried elsewhere in the chain) only the hash. Order invariant wouldn't let me recover that. I need some kind of undo data, even if its just the location of the original txn so that I could fetch it. (though more efficient if you can serve me up a whole undo block)
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
July 31, 2012, 08:32:12 PM
Last edit: August 01, 2012, 09:50:41 AM by etotheipi
 #181

Perhaps that's another reason to use insert-order-invariant tree structure.  If you have to reverse some transactions, you don't have to worry how you undo them.  Undoing a block/tx is just as easy as adding it in the first place.

You still have to have the complete data that you would remove. E.g. when I spend txn X,  I don't specify all of X's data to spend it (thats burried elsewhere in the chain) only the hash. Order invariant wouldn't let me recover that. I need some kind of undo data, even if its just the location of the original txn so that I could fetch it. (though more efficient if you can serve me up a whole undo block)

Yeah, I actually realized that and was editing my previous post to say this:

It's not actually trivial to reverse blocks in any particular pure-pruned scheme, since adding blocks involves deleting UTXOs.  So reversing the block means you have to re-add UTXOs that are not defined by the block you are reversing (it only references the OutPoint by hash:index, not the whole UTXO).  So you have to either save them or request them from another node.  Perhaps the solution is to keep a circular buffer of the last N UTXOs that were removed, as long as N is enough to cover the last, say 50 blocks.  Any time you use map.delete when updating the tree, you use a buffer.add to save it (which will also discard the oldest element in the buffer).  Then when you need to reverse a tx, you know the OutPoints that need to be re-added, and if N is large enough, you'll have it in the buffer.  Worst case, you have to fetch some data from another node, which isn't terrible.


Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1135


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
July 31, 2012, 09:13:22 PM
 #182

It's probably fodder for another topic, but I am of the opinion that in the event of a very large reorg (where more than 6 blocks are rolled back), the proper behavior of a bitcoin client should be to stop functioning and demand that a new client be downloaded from the developer(s) of that client, and to assist the user in exporting their wallet and ensuring their replacement client is properly signed by the developer.  This is based on philosophizing that it would be better for the bitcoin network to go down in an orderly fashion in the event of an attack - long enough for developers to agree on countermeasures tailored to the attack - just like the recent space rocket that changed its mind and aborted the launch at the last second - rather than for it to stay up and operate chaotically and at a financial loss to users.

Mentioning that is not intended to derail the thread - I am certain there isn't a consensus on what I just said, and it may very well be an unacceptable idea.

But I am saying it because: If it were ever to be debated and determined that what I threw out IS in fact a good idea, it would also settle the question as to how to ensure the tree can be rolled back.  The answer would be simple: at a minimum, keep a copy of the tree as it looked at the point representing the maximum amount we're willing to roll back without ceasing to function.


Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
jgarzik
Legendary
*
qt
Offline Offline

Activity: 1596
Merit: 1091


View Profile
August 01, 2012, 12:27:12 AM
 #183

It's probably fodder for another topic, but I am of the opinion that in the event of a very large reorg (where more than 6 blocks are rolled back), the proper behavior of a bitcoin client should be to stop functioning and demand that a new client be downloaded from the developer(s) of that client, and to assist the user in exporting their wallet and ensuring their replacement client is properly signed by the developer.  This is based on philosophizing that it would be better for the bitcoin network to go down in an orderly fashion in the event of an attack - long enough for developers to agree on countermeasures tailored to the attack - just like the recent space rocket that changed its mind and aborted the launch at the last second - rather than for it to stay up and operate chaotically and at a financial loss to users.

During the output overflow incident, we were all rather glad that the collective client base did not do this.  Once miners upgraded, everyone reorg'd back into working order.

This is just one of many reasons why there is a 120-block new-bitcoin confirmation policy in place.


Jeff Garzik, Bloq CEO, former bitcoin core dev team; opinions are my own.
Visit bloq.com / metronome.io
Donations / tip jar: 1BrufViLKnSWtuWGkryPsKsxonV2NQ7Tcj
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
August 01, 2012, 09:39:48 AM
Last edit: August 01, 2012, 09:51:36 AM by etotheipi
 #184

But I am saying it because: If it were ever to be debated and determined that what I threw out IS in fact a good idea, it would also settle the question as to how to ensure the tree can be rolled back.  The answer would be simple: at a minimum, keep a copy of the tree as it looked at the point representing the maximum amount we're willing to roll back without ceasing to function.

I think it's an interesting idea, and as you suggested, I don't want to derail the thread.  However, I think that the amount of history to maintain is not a critical question.  Standard reorgs on the main chain are rarely more than 1 block.  We save enough info for 10 blocks, and if a reorg happens further back than that (and the client is willing to continue), then the node can just request the missing information from peers.  It's all verifiable information, and if you are trying to catch up to the longest chain there should be plenty of peers who can supply it.


Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
socrates1024
Full Member
***
Offline Offline

Activity: 126
Merit: 108


Andrew Miller


View Profile
August 03, 2012, 09:56:25 PM
 #185

It's probably fodder for another topic, but I am of the opinion that in the event of a very large reorg (where more than 6 blocks are rolled back), the proper behavior of a bitcoin client should be to stop functioning and demand that a new client be downloaded from the developer(s) of that client, and to assist the user in exporting their wallet and ensuring their replacement client is properly signed by the developer.  This is based on philosophizing that it would be better for the bitcoin network to go down in an orderly fashion in the event of an attack - long enough for developers to agree on countermeasures tailored to the attack - just like the recent space rocket that changed its mind and aborted the launch at the last second - rather than for it to stay up and operate chaotically and at a financial loss to users.

Mentioning that is not intended to derail the thread - I am certain there isn't a consensus on what I just said, and it may very well be an unacceptable idea.

But I am saying it because: If it were ever to be debated and determined that what I threw out IS in fact a good idea, it would also settle the question as to how to ensure the tree can be rolled back.  The answer would be simple: at a minimum, keep a copy of the tree as it looked at the point representing the maximum amount we're willing to roll back without ceasing to function.



Perhaps one reason to prefer a balanced binary tree is that if you want to store a snapshot of every previous blockchain, you only have to store the differences. There is a "Persistent Authenticated Datastructure" that would let you handle roll backs efficiently.


(This image is from "Persistent Authenticated Dictionaries and their Applications" by Anagnostopoulos Goodrich and Tamassia http://cs.brown.edu/people/aris/pubs/pad.pdf )

I do not know if this is possible with tries.

amiller on freenode / 19G6VFcV1qZJxe3Swn28xz3F8gDKTznwEM
[my twitter] [research@umd]
I study Merkle trees, credit networks, and Byzantine Consensus algorithms.
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
August 05, 2012, 07:06:59 PM
 #186

Perhaps one reason to prefer a balanced binary tree is that if you want to store a snapshot of every previous blockchain, you only have to store the differences. There is a "Persistent Authenticated Datastructure" that would let you handle roll backs efficiently.

[Image suppressed]

(This image is from "Persistent Authenticated Dictionaries and their Applications" by Anagnostopoulos Goodrich and Tamassia http://cs.brown.edu/people/aris/pubs/pad.pdf )

I do not know if this is possible with tries.

I'm not 100% sure I understand that picture.  It looks like you are saving the pointer-tree of each state, though there's only ever one copy of each piece of underlying data.  Multiple trees will be maintained, but leaf nodes will point to global copies of TxOut data.   Is that correct?

In that case, saving the state of the tree at a previous time means storing a whole lot of pointer data.  If I consider a basic binary search tree and use 8-byte pointers (assuming it's all in memory), then each node in your binary tree is going to store at least two pointers (prtLeft & ptrRight) and probably a few more bytes for random stuff like isLeaf and isRedOrBlack, etc.  So I assume 20 bytes per tree node.  For a tree of 2 million unspent TxOuts, then that's about 40 MB of storage for each state you want to store (each block).  And that's going to go up linearly with the size of the tree. 


(NOTE: I'm discussing this as if it's a single tree full of TxOuts, even though my proposal is about a main tree full of addresses containing unspent TxOuts, and subtrees of TxOuts.  However, the concepts are still valid, it's just easier to make my points as if it's a single tree of TxOuts).

On the other hand, if you're using a trie, you don't have to save the state of the entire trie since its structure is purely deterministic for a given set of addresses.  You only have to save the nodes that were removed that might have to be re-added later in the event of a rollback.  You look at the OutPoints of the TxIns in each tx of the block that need to be reversed, and re-add them to the trie.  Then remove the ones that were added by those transactions.  This is all standard O(1) trie operations.

Thus, the data that needs to be stored in order to rollback a block in the trie (besides the blockdata itself) is proportional to the transaction volume on the networkAnd transaction volume is capped by network rules.  Currently, each block could be up to 1MB.  If you filled a block full of transactions that were entirely TxIns with nothing else, it would result in removing about 7200 TxOuts from the tree.  That means that that absolute upper limit for storage per block you want to be able to roll back is about 250 kB, regardless of tree size.


Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
socrates1024
Full Member
***
Offline Offline

Activity: 126
Merit: 108


Andrew Miller


View Profile
August 08, 2012, 12:16:45 AM
 #187

I'm not assuming it's all in memory. I'm not talking about pointers, but about a graph of hashes.

To be sure we're on the same page - we are talking about a merkle trie with one hash at the root, and hashes at each level, right?

Also you keep saying O(1) when you mean O(log N). I don't think you would agree with the following statement:
    "Each transaction is associated with a unique hash. There are only 2^256 hashes, therefore the number of transactions is O(1)."

Still, I am leaning towards thinking the simplicity of the trie outweighs any marginal cost benefits of the balanced binary tree. They're of the same order in all operations.

amiller on freenode / 19G6VFcV1qZJxe3Swn28xz3F8gDKTznwEM
[my twitter] [research@umd]
I study Merkle trees, credit networks, and Byzantine Consensus algorithms.
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
August 08, 2012, 12:42:27 AM
 #188

I'm not assuming it's all in memory. I'm not talking about pointers, but about a graph of hashes.

To be sure we're on the same page - we are talking about a merkle trie with one hash at the root, right?

There does appear to be a disconnect here.  Let me explain what my understanding of your proposal is:

(1) There is a database of TxOuts.  It has all current, unspent TxOuts in it.  TxOuts are added and deleted (or marked deleted) when blocks come in.
(2) The TxOuts need to be organized into a tree of some sort in order to come up with the single "merkle root" to be included in the meta-chain header ("merkle" is in quotes, because using a binary tree or trie it's no longer a "merkle" tree).  The relationship between DB elements (1) will be represented either as a Binary Search Tree or trie using "pointers".
(3) Thus, after every block, you update the database of TxOuts (1), and the pointers in binary search tree or trie (2).  I was focusing on (2) when I say that you need to store pointers.  Not necessarily in RAM... but somewhere, data needs to be held that identifies the structure of the tree/trie associated with the TxOuts in database (1). 
(4) In a classic binary search tree held in RAM, it uses pointers -- specifically a left pointer and a right pointer for each node.  In this context, whether it's held in RAM or on disk, there's still 8 bytes needed (at minimum) per pointer to represent the relationships.  And a binary tree will use about 1 such pointer per node.

This is not a negligible amount of data.   If TxOuts are 35 bytes, pointers are 8 bytes, then a "tree" of pointers will be about 25% of the size of the entire TxOut database.  It's a rough estimate, not intended to be exact.

So now back to my original point:  it looks to me that in order to "save the state" of the tree, you need to save the entire tree of pointers between nodes (2).  And you need the full tree.  Sure the TxOuts database stores each TxOut once, but saving all the pointers to represent the past states of the tree will triple your space requirements to save just 8 blocks.


Also you keep saying O(1) when you mean O(log N). I don't think you would agree with the following statement:
    "Each transaction is associated with a unique hash. There are only 2^256 hashes, therefore the number of transactions is O(1)."

Another disconnect here.  I'm not sure where your example came from, but here's what I'm referring to -- let's compare a binary search tree full of Bitcoin addresses to a trie of Bitcoin addresses (the entire set of all addresses on the network containing unspent TxOuts).  Assume at a given time there are N such addresses.

Binary Search Tree (red-black or any balanced tree) -- depth is O(log(N)).   
   -- Querying an element is O(logN)
   -- Inserting an element is O(logN)
   -- Deleting an element is O(logN) 
Basic trie:  depth of the tree is equal to the length of the longest key:  so the depth is 20 (because addresses are 20 bytes).
   -- Querying an element is O(1) -- it takes 20 hops to get from the root node to the leaf you want
   -- Inserting an element is O(1) -- it takes 20 hops to get from the root node to the leaf you want
   -- Deleting an element is O(1) -- it takes 20 hops to get from the root node to the leaf you want

So even if there are 100 quintillion bitcoin addresses with unspent TxOuts in the master alt-chain tree, it still takes you exactly 20 hops to query, insert or delete a value in the trie (and less if you're using a patricia tree).

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
socrates1024
Full Member
***
Offline Offline

Activity: 126
Merit: 108


Andrew Miller


View Profile
August 08, 2012, 12:46:23 AM
 #189

Could you describe how you would update the root hash for the trie?

Also, it most certainly takes fewer than 40 hops* to get from the root of the balanced binary tree to any leaf. Log N is less than 20.

*Assuming one hop is actually "8 hops". I don't feel like I'm making this point too clear. You can have a 'balanced B tree' that has up to 8 children at each level. Then there are only 20 hops.

Or to put it another way, if there are so many transactions that we have collisions by birthday paradox, then we would need to pick a bigger hash.

amiller on freenode / 19G6VFcV1qZJxe3Swn28xz3F8gDKTznwEM
[my twitter] [research@umd]
I study Merkle trees, credit networks, and Byzantine Consensus algorithms.
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
August 08, 2012, 01:25:33 AM
Merited by ABCbits (1)
 #190

Could you describe how you would update the root hash for the trie?

In both BSTs and trie structures there will be leaf nodes and branch nodes.  In some structures (such as the BST) branch nodes may also contain leaf data.  In the context of this proposal, the master tree will be the set of all addresses containing any unspent TxOuts.  So a given leaf represents one such address.  

--For the purposes of organizing the tree/trie, that leaf's value will be the 20-byte address.  So the 20-byte address is the "key" of the master tree.
--For the purposes of computing the root hash, that leaf's value is the root of the subtree of unspent TxOuts for that address (which is constructed in a similar way to the address tree)
--For the purposes of computing the root hash, all branch nodes' hash values are the hash of the concatenation of its children's hash values.

In the simplest case, you have only pure branch nodes and pure leaf nodes.   This looks very much like the merkle trees we already use.  Then each node's "hash value" is the hash of the two children "hash values" concatenated.  [LeftPtrNodeValue || RightPtrNodeValue].  You walk up the tree computing this for every node until you get to the root.

In the BST case where nodes are actually both branch nodes and leaf nodes, then you just use hash([LeafValue | LeftPtrNodeValue | RightPtrNodeValue]).  

In the Patricia/Hybrid Tree case, there are purely branch nodes and leaf nodes, though the branch nodes may have "skip strings".  So a leaf node's hash value is just the root hash of the subtree.  And a branch node's value is the hash of the concatenated skip string and its non-null child node values.

When you add or remove a node from the tree, you are changing the hash value of its parent, which changes its parent, etc.  So for a BST, you to add or delete a node, you have to recompute/update O(logN) other nodes hash values to get the new root value.  Same thing for the trie, except there is exactly 20 parents to update, regardless of the size of the tree (and see below for why this will ultimately be only 4-6).

I really need to make a picture...  (coming soon)


It most certainly takes fewer than 40 hops to get from the root of the balanced binary tree to any leaf. Log N is less than 20.

Right, number of hops of a red-black tree will be 2*logN worst case.  So your point is well-received that N will probably never get high enough in this environment for the query/insert/delete time of a basic trie to absolutely beat the binary search tree.  Both of them have completely acceptable query/insert/delete times.  So two things to mention:
(1) It's still entirely accurate to label trie operations as O(1).  It just not be relevant for this application.
(2) I'm actually proposing a Patricia tree, which is level-compressed.  So a typical access time will be 4-6 hops.  Even with trillions of nodes in the tree.  The absolute max is 20 but it would never be realized.

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
ripper234
Legendary
*
Offline Offline

Activity: 1358
Merit: 1003


Ron Gross


View Profile WWW
August 08, 2012, 04:59:37 AM
 #191

Perhaps it's time to formulate this in the wiki?
Maybe as a BIP draft?

Or is it too soon?

Please do not pm me, use ron@bitcoin.org.il instead
Mastercoin Executive Director
Co-founder of the Israeli Bitcoin Association
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
August 08, 2012, 05:07:56 AM
Last edit: January 10, 2015, 03:41:18 AM by etotheipi
 #192

Pictures!

I love inkscape.  Here is a visual explanation of tries, Patricia trees, Patricia-Brandais Hybrid trees, and the technique for getting "hash values" out of each node.  I need a better name than "hash values", which is kind of vague... "signature" of the node?  "shape" of the node?  Ehh, whatever, enjoy the pictures!


A basic trie on base-10 keys (which in this proposal would be base-256, and either address strings or TxOut values)



Patricia trees are the same, but with level compression.  This is critical since trees/tries will be very sparse relative to the fact they can theoretically hold 2^160 or 2^256 nodes.  One remaining problem is that all branch nodes still store 256 pointers, even when 255 of them are NULL.



The hybrid tree is what I would propose.  The pointer-arrays are converted to linked lists and only non-null children are stored



Since this data structure would also be used for the TxOut subtrees, it should have a compact representation for one-node trees.  Indeed it does:



The "values" of each leaf is just the root of the sub tree, and the value of each branch is the skip-string concatenated with all its children's values.




Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
socrates1024
Full Member
***
Offline Offline

Activity: 126
Merit: 108


Andrew Miller


View Profile
August 08, 2012, 05:25:45 AM
 #193

Ah but etotheipi, let's focus on the last image, since that is the first one to mention hashes.

You have taken the hash of the values of the child nodes, but not the hash of the children's hashes. You cannot securely traverse this tree from the root hash.

amiller on freenode / 19G6VFcV1qZJxe3Swn28xz3F8gDKTznwEM
[my twitter] [research@umd]
I study Merkle trees, credit networks, and Byzantine Consensus algorithms.
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
August 08, 2012, 05:31:19 AM
 #194

Ah but etotheipi, let's focus on the last image, since that is the first one to mention hashes.

You have taken the hash of the values of the child nodes, but not the hash of the children's hashes. You cannot securely traverse this tree from the root hash.

I don't follow your logic.  Did I miss something?  Can you elaborate?

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1135


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
August 08, 2012, 05:41:38 AM
 #195

I haven't been following the tree logic as I only understand them to a certain extent, but I just wanted to make sure that whatever tree structure is chosen, that updating the tree to accommodate new incoming transactions is nearly always instantaneous no matter what the size.  You are surely aware that with growth incoming transactions can start to number into the hundreds and the thousands per second.  If updating the tree for each incoming transaction is burdensome in resources, it will create a perverse incentive for miners toward taking shortcuts in transaction processing or omitting transactions from blocks.  I'm sure you already know this, just wanted to make sure this weighed somewhere decent on the requirements list for what structure is eventually chosen.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
August 08, 2012, 05:47:07 AM
 #196

I haven't been following the tree logic as I only understand them to a certain extent, but I just wanted to make sure that whatever tree structure is chosen, that updating the tree to accommodate new incoming transactions is nearly always instantaneous no matter what the size.  You are surely aware that with growth incoming transactions can start to number into the hundreds and the thousands per second.  If updating the tree for each incoming transaction is burdensome in resources, it will create a perverse incentive for miners toward taking shortcuts in transaction processing or omitting transactions from blocks.  I'm sure you already know this, just wanted to make sure this weighed somewhere decent on the requirements list for what structure is eventually chosen.

Inserting, deleting or changing nodes only involves modifying the branch on which that node operates.  So you find the node to be modified, then you recalculate its parent, then recalculate its parent, and so on up to the root node.  Given the structure of the tree, that's likely to only be 4-6 node recalculations per update.  However, each of those nodes might be dense, meaning you might be concatenating a couple hundred values.  However, the total computation is completely independent of tree size (well, it will always be less than 20 nodes to update, though the actually number will asymptotically increase towards 20 as the tree gets bigger [though in reality it will probably never exceed 8 even if 100% of the world switched to BTC]).

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1011


View Profile
August 08, 2012, 06:37:56 AM
 #197

You are surely aware that with growth incoming transactions can start to number into the hundreds and the thousands per second.
Can it really? Current limitations on block sizes limit the number of transactions to no more than a thousand per block, or a few per second. Changing those limitations would result in a hard-fork.

Or am I missing something?

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
socrates1024
Full Member
***
Offline Offline

Activity: 126
Merit: 108


Andrew Miller


View Profile
August 08, 2012, 06:46:51 AM
 #198

I don't follow your logic.  Did I miss something?  Can you elaborate?

Yes! You're missing the step where you take a hash of the children's hashes, not just their values. For example, suppose your root node is fully stocked and contains "0123456789". You seem to be saying its hash would be H("0123456789"). That tells you the values of the children, but it does not let you look up (and validate) the rest of the data in the next node. For a hash tree to work, you need to take the hash-of-a-hash at each step.

amiller on freenode / 19G6VFcV1qZJxe3Swn28xz3F8gDKTznwEM
[my twitter] [research@umd]
I study Merkle trees, credit networks, and Byzantine Consensus algorithms.
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1135


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
August 08, 2012, 07:29:51 AM
 #199

You are surely aware that with growth incoming transactions can start to number into the hundreds and the thousands per second.
Can it really? Current limitations on block sizes limit the number of transactions to no more than a thousand per block, or a few per second. Changing those limitations would result in a hard-fork.

Or am I missing something?

That's true in regards to current limitations, but one day we'll be thinking about how to scale past them.  It would be preferable for that scaling process to be able to happen without the requirement that the tree algorithm be rethought with it because the chosen candidate ended up becoming the slowest link in the chain.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1135


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
November 04, 2012, 04:29:54 PM
Last edit: November 04, 2012, 04:42:42 PM by casascius
 #200

I know it's been a while since this topic has been visited, but I'd like to make a proposal here:

Rather than settle on the "best" way to implement this tree, how about settle on the "simplest", so that way the community can catch the bug and start cranking their brains on the best way to implement this tree with the right balance of features and complexity when the time comes to consider making it part of the protocol standard.

By "simplest", I mean implemented as follows:

1. A simple binary tree that starts out balanced, but maintaining the balancing from block to block is optional.  A miner has the choice to rebalance the tree for the next block, but doesn't have to.  The lack of a requirement to keep the tree balanced is meant as an effort to discourage mining empty blocks because a miner doesn't want the CPU burden or any delay associated with rebuilding the whole tree with each incoming transaction.

2. No ability to roll back.  Rolling back must be accomplished either by rebuilding the tree from scratch, or by starting with a known good backup and rolling it forward.  Backups are systematically deleted such that the backup burden grows O(log n) relative to the total block height.  More specifically, the backup of any given block's tree should have a lifetime of 2^n blocks where n is the number of contiguous zero bits at the end of the block height.  Block 0x7890's tree backup should last sixteen blocks because 0x7890 ends in four zero bits.  The backup for the tree of block 0x10000 should last 0x10000 blocks.

Now, why would I suggest a methodology that clearly avoids taking advantage of features that would make a "better mousetrap" so to speak?  Here are the most prominent reasons:

1. At some point, the Bitcoin community may come to a consensus that we should redefine a valid Bitcoin block to incorporate the root hash of a valid meta-tree rather than having it be optional coinbase clutter.  Until then, this is merely an optional performance-enhancing and usability-enhancing feature without any hard commitment to a standard.  We should help people understand the base case for what it is, and then move on to derivative cases that do the job better.

2. There is serious value in simplicity.  The more things are needlessly complex, the higher the barrier to entry for new developers of bitcoin software.  We are at a point where we need more developers on board than we need the disk space saved by what would be (for the current block height and all block heights for the foreseeable future) about 20 backups of the meta tree on each user's disk.  Besides being much more difficult for the average developer to understand, requiring a tree that must do a moonwalk during a rare edge case which is very difficult for a developer to reproduce and test makes for an exploitable danger that implementations may fail to do the right thing when the right thing is needed the most.

3. The Bitcoin community could use the lessons learned in a basic "proof of concept" implementation of this without being committed to any specific methodology for optimizing it.  This will help the community at large understand which use cases evolve from the availability of the tree, and then come to an intelligent consensus as to what features and attributes of a meta tree are the most valuable and which bargains of complexity versus power are worth making.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
November 04, 2012, 05:47:44 PM
Last edit: November 04, 2012, 06:08:25 PM by etotheipi
 #201

I know it's been a while since this topic has been visited, but I'd like to make a proposal here:

Rather than settle on the "best" way to implement this tree, how about settle on the "simplest", so that way the community can catch the bug and start cranking their brains on the best way to implement this tree with the right balance of features and complexity when the time comes to consider making it part of the protocol standard.

By "simplest", I mean implemented as follows:

1. A simple binary tree that starts out balanced, but maintaining the balancing from block to block is optional.  A miner has the choice to rebalance the tree for the next block, but doesn't have to.  The lack of a requirement to keep the tree balanced is meant as an effort to discourage mining empty blocks because a miner doesn't want the CPU burden or any delay associated with rebuilding the whole tree with each incoming transaction.

2. No ability to roll back.  Rolling back must be accomplished either by rebuilding the tree from scratch, or by starting with a known good backup and rolling it forward.  Backups are systematically deleted such that the backup burden grows O(log n) relative to the total block height.  More specifically, the backup of any given block's tree should have a lifetime of 2^n blocks where n is the number of contiguous zero bits at the end of the block height.  Block 0x7890's tree backup should last sixteen blocks because 0x7890 ends in four zero bits.  The backup for the tree of block 0x10000 should last 0x10000 blocks.

Now, why would I suggest a methodology that clearly avoids taking advantage of features that would make a "better mousetrap" so to speak?  Here are the most prominent reasons:

1. At some point, the Bitcoin community may come to a consensus that we should redefine a valid Bitcoin block to incorporate the root hash of a valid meta-tree rather than having it be optional coinbase clutter.  Until then, this is merely an optional performance-enhancing and usability-enhancing feature without any hard commitment to a standard.  We should help people understand the base case for what it is, and then move on to derivative cases that do the job better.

2. There is serious value in simplicity.  The more things are needlessly complex, the higher the barrier to entry for new developers of bitcoin software.  We are at a point where we need more developers on board than we need the disk space saved by what would be (for the current block height and all block heights for the foreseeable future) about 20 backups of the meta tree on each user's disk.  Besides being much more difficult for the average developer to understand, requiring a tree that must do a moonwalk during a rare edge case which is very difficult for a developer to reproduce and test makes for an exploitable danger that implementations may fail to do the right thing when the right thing is needed the most.

3. The Bitcoin community could use the lessons learned in a basic "proof of concept" implementation of this without being committed to any specific methodology for optimizing it.  This will help the community at large understand which use cases evolve from the availability of the tree, and then come to an intelligent consensus as to what features and attributes of a meta tree are the most valuable and which bargains of complexity versus power are worth making.

I generally approve of the idea of prototyping the meta-chain CONOPs, and let people/devs start thrashing out how to use it, how to improve it etc.  

However, if you're arguing for simplicity, then you must use the Trie/Patricia/De la Brandais tree.  There is no need for snapshotting/backups.  Put the data in.  If a block has to be rolled back, remove them from the tree.  For a given snapshot in time, all Trie-based implementations will agree.  It's part of their design.

This just won't be possible using BST's, though.  It's not a matter of preference, it's a matter of standardization.  If you use BST, you might be inclined to use STL map<a,b> in C++, or similar implementation in Java, etc.  But the map<> data structure will be designed/optimized different for each architecture, compiler, and OS.  There's no guarantee that a BST in Linux using gcc 4.3 will even match the BST implementation in Linux gcc 4.8 -- they might've changed the BST implementation under-the-hood, optimizing the rebalancing operations differently.  And you'd never know, because underlying tree structure is not specified in the C++ standard for map<>.  Only the expected run times of insert/delete/query/etc.

So, miners won't be able to agree on the root hash unless they all build the BST exactly the same way, so they must agree on the BST algorithm to use.   Who's writing that implementation?  Will they create an implementation of the exact same algorithm in C, Java, C++, haskell, etc?   This is why I keep pushing Trie-based algorithms -- any implementation of the Trie (or whatever variant is agreed upon) will work.  A trie that is implemented in C++ by someone in China can be used to produce the same root hash as a Java implementation written by some kid in his basement in Idaho (assuming the implementations are correct).  

Yes, it's possible to BSTs, but it's a lot of work.  And it's not simple.  To design this data structure around BSTs requires every implementer of the meta-chain to use this specific BST algorithm.  There is no such ambiguity with Trie structures -- you could look them up in any textbook.

So, I agree that we should do this.  It needs to be done and I think something like this is ultimately the future of BTC (especially lite nodes).  If only I could get all my other priorities in Armory finished, then I would be focusing on this.

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
socrates1024
Full Member
***
Offline Offline

Activity: 126
Merit: 108


Andrew Miller


View Profile
November 04, 2012, 06:08:41 PM
 #202

Yes, it's possible to BSTs, but it's a lot of work.  And it's not simple.  To design this data structure around BSTs requires every implementer of the meta-chain to use this specific BST algorithm.  There is no such ambiguity with Trie structures -- you could look them up in any textbook.

It's true that every implementer will need to use the same algorithm, otherwise the root hashes will be incompatible. And of course you can't just use a standard library map because those do not support computing hashes!

But there are plenty of ambiguities involved in using tries. Here's an earlier quote from you where you mentioned an optimization using skip strings:
Quote
In the Patricia/Hybrid Tree case, there are purely branch nodes and leaf nodes, though the branch nodes may have "skip strings".  So a leaf node's hash value is just the root hash of the subtree.  And a branch node's value is the hash of the concatenated skip string and its non-null child node values.

Surely if you use skip strings, that will change the root hash, so everyone would have to agree on the particular algorithm to use? Let's make several implementations, and evaluate which one is the best.

By way of an update, I am still making gradual progress on a BST implementation. Previously I had implemented a red-black balanced merkle tree in python and described the complete protocol so that any independent implementation should arrive at the same hash. Unfortunately the implementation was too slow/memory-intensive to get past 130k or so of the blockchain. Since then, I have made a C++ implementation that's thousands of times faster (it runs within a factor of 3 as fast as the linux kernel LLRB tree, and although it doesn't compute hashes, the structure of the tree will be the same).
https://github.com/amiller/redblackmerkle/blob/llrb/c_redblack.hpp I have also sketched out a solution for I/O efficient validation of a batch of updates involving a priority queue. But I'm not prepared to make a full post on this yet.

I'm pointing all this out so that you can't say no progress is being made! Until someone from the 'trie' camp catches up, the simplest solution is a BST since some code for this already exists.

amiller on freenode / 19G6VFcV1qZJxe3Swn28xz3F8gDKTznwEM
[my twitter] [research@umd]
I study Merkle trees, credit networks, and Byzantine Consensus algorithms.
2112
Legendary
*
Offline Offline

Activity: 2128
Merit: 1060



View Profile
November 04, 2012, 06:25:28 PM
 #203

I generally approve of the idea of prototyping the meta-chain CONOPs, and let people/devs start thrashing out how to use it, how to improve it etc.  

However, if you're arguing for simplicity, then you must use the Trie/Patricia/De la Brandais tree.  There is no need for snapshotting/backups.  Put the data in.  If a block has to be rolled back, remove them from the tree.  For a given snapshot in time, all Trie-based implementations will agree.  It's part of their design.

This just won't be possible using BST's, though.  It's not a matter of preference, it's a matter of standardization.  If you use BST, you might be inclined to use STL map<a,b> in C++, or similar implementation in Java, etc.  But the map<> data structure will be designed/optimized different for each architecture, compiler, and OS.  There's no guarantee that a BST in Linux using gcc 4.3 will even match the BST implementation in Linux gcc 4.8 -- they might've changed the BST implementation under-the-hood, optimizing the rebalancing operations differently.  And you'd never know, because underlying tree structure is not specified in the C++ standard for map<>.  Only the expected run times of insert/delete/query/etc.

So, miners won't be able to agree on the root hash unless they all build the BST exactly the same way, so they must agree on the BST algorithm to use.   Who's writing that implementation?  Will they create an implementation of the exact same algorithm in C, Java, C++, haskell, etc?   This is why I keep pushing Trie-based algorithms -- any implementation of the Trie (or whatever variant is agreed upon) will work.  A trie that is implemented in C++ by someone in China can be used to produce the same root hash as a Java implementation written by some kid in his basement in Idaho (assuming the implementations are correct).  

Yes, it's possible to BSTs, but it's a lot of work.  And it's not simple.  To design this data structure around BSTs requires every implementer of the meta-chain to use this specific BST algorithm.  There is no such ambiguity with Trie structures -- you could look them up in any textbook.

So, I agree that we should do this.  It needs to be done and I think something like this is ultimately the future of BTC (especially lite nodes).  If only I could get all my other priorities in Armory finished, then I would be focusing on this.
The claim "This just won't be possible using BST's, though." is plain false. It confuses the data structure and algorithm with their implementation. This gotta be some sort of miscommunication, or maybe the author had too much fun at a party yesterday.

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
November 04, 2012, 06:31:28 PM
 #204

Yes, it's possible to BSTs, but it's a lot of work.  And it's not simple.  To design this data structure around BSTs requires every implementer of the meta-chain to use this specific BST algorithm.  There is no such ambiguity with Trie structures -- you could look them up in any textbook.

It's true that every implementer will need to use the same algorithm, otherwise the root hashes will be incompatible. And of course you can't just use a standard library map because those do not support computing hashes!

But there are plenty of ambiguities involved in using tries. Here's an earlier quote from you where you mentioned an optimization using skip strings:
Quote
In the Patricia/Hybrid Tree case, there are purely branch nodes and leaf nodes, though the branch nodes may have "skip strings".  So a leaf node's hash value is just the root hash of the subtree.  And a branch node's value is the hash of the concatenated skip string and its non-null child node values.

Surely if you use skip strings, that will change the root hash, so everyone would have to agree on the particular algorithm to use? Let's make several implementations, and evaluate which one is the best.

"Skip strings" are part of the standard Patricia tree implementation.  It just may be called something different in different texts.  Either you use a Trie, Patricia tree, a De la Brandais tree, or a Hybrid tree.  Once the correct one is agreed upon, the ambiguities in implementation shrink to basically nothing.  It becomes a question of how to traverse and aggregate tree data for Bitcoin purposes, not how to implement the data structure.  That's something that will have to be done for any data structure that is used.

On the other hand, a red-black tree that is optimized differently, and thus produce different root hash, will still be called a red-black tree.  To describe to someone what that optimization is, well, requires a description of the algorithm (and probably code samples).  

I know it's possible, I'm just pointing out the difficulties that could arise out of different people unknowingly producing different tree structures.  Most likely, under bizarre conditions with complicated rebalance operations, and it would be remarkably frustrating to debug.


I'm pointing all this out so that you can't say no progress is being made! Until someone from the 'trie' camp catches up, the simplest solution is a BST since some code for this already exists.

I do appreciate that you are doing this.  I wish I had time for it.  Perhaps your C++ implementation is sufficient for porting to other languages, so that such a uniform implementation can be achieved.   Clearly, I'm still opposed to it for other reasons (like necessitating backups/snapshots for re-orgs), but you can still accomplish what is needed.   And I hope that we can hit all the snags and start figuring them out sooner than later.


The claim "This just won't be possible using BST's, though." is plain false. It confuses the data structure and algorithm with their implementation. This gotta be some sort of miscommunication, or maybe the author had too much fun at a party yesterday.

Perhaps you misunderstood me.  I was saying it won't be possible for everyone to agree on the root hash unless they use the exact same implementation.  Standard implementations (such as STL map<>) are standardized only in run-time, not underlying structure.  Thus, a widespread "experiment" using BSTs won't be simple without a uniform implementation across all languages, etc.  This may be a lot of work.  However, if it was tries, I consider it quite simple that anyone can download any [correct] trie implementation from anywhere, and know that they can get the same answer.  Because the structure is guaranteed.


Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
2112
Legendary
*
Offline Offline

Activity: 2128
Merit: 1060



View Profile
November 04, 2012, 06:41:57 PM
Last edit: November 04, 2012, 07:01:56 PM by 2112
 #205

Because the structure is guaranteed.
I am not buying this guarantee. In this Bitcoin milieu I've soo much confused-endian (or accidental-endian) code that I will not take anything for granted, least of which that there will be a sensible agreement on how to represent the bignums.

Edit: Or maybe I will state it like this: The structure will be guaranteed so long as any implementor is confused the same way as the original authors of the Satoshi client and finds the implementation errors made in the original client obvious and natural.

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0
Pieter Wuille
Legendary
*
qt
Offline Offline

Activity: 1072
Merit: 1170


View Profile WWW
November 04, 2012, 06:44:46 PM
 #206

Obviously a fully-specified algorithm will need to be decided upon. That is no different than the definition of the current Merkle tree implementation (together with its weird odd-number-of-hashes-on-level behaviour) currently in use by Bitcoin.

Obviously implementations will be necessary for this, and one will not be able to reuse standard libraries that have not fully specified behaviour (or don't have authentication built in). It will almost certainly require a specific implementation in every language clients need. This has dangers, but it's not impossible. Extensive test sets will be needed that have 100% coverage in a reference implementation, to be sure every edge case or weird rebalancing rule is tested.

My gut feeling makes me prefer trie/patricia based solutions, because their structure is independent from the history that lead to their contents. This means a simply diff of the set represented by two tries is enough to specify the modification from one to the other. Authenticated balanced trees on the other hand do expose their history through the merkle root, and differences need to contain structural information. This is not a good reason, just a preference, and the one who implements the first usable and acceptable solutions will probably get to specify what data structure is chosen. Fully deterministic set representation may make an implementation easier to test, though.

I do Bitcoin stuff.
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1135


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
November 04, 2012, 06:45:19 PM
 #207

By recommending a binary tree, or rather, a Merkle tree in the form most resembling a binary tree, I am suggesting that from block to block, the miner has the option of either just updating the tree (in a well-defined deterministic manner but leaving it unbalanced) or updating and rebalancing the tree such that all of the leaf nodes are the same distance (or distance-1) from the root, again in a well-defined deterministic manner.

I am not suggesting leaving the implementation up to the STL or any other form of library.

I don't believe Patricia trees are "simpler" when measured in the amount of human learning and neurons one must dedicate to understanding the concept.  That doesn't mean I think it's too hard to learn, but rather, I doubt the cost (measured in terms of complexity of the specification) is worth the benefit.

If you tell a developer, "Now you've got to learn what a Patricia tree is to write a Bitcoin client", and then "Now that you've implemented it, you've got to simulate numerous cases of rollbacks to test and feel good that your implementation works backwards as well as forward" you have just made many more developers say "to hell with it, I'll develop something else".

... not to mention implementing a chain reorg strategy consisting of "now talk to your peers and start asking them for now-orphaned blocks (hopefully they have them still), preferably delivered in reverse order, so you can roll your tree back intact" rather than starting with a copy of the tree at or before the point in time of the split and rolling it forward.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
jojkaart
Member
**
Offline Offline

Activity: 97
Merit: 10


View Profile
November 04, 2012, 06:52:22 PM
 #208

If you tell a developer, "Now you've got to learn what a Patricia tree is", and then "Now that you've implemented it, you've got to simulate numerous cases of rollbacks to test and feel good that your implementation works backwards as well as forward" you have just made many more developers say "to hell with it, I'll develop something else".

This sort of a developer is dangerous to have developing Bitcoin's internal structures. I think it's better they stay away.
2112
Legendary
*
Offline Offline

Activity: 2128
Merit: 1060



View Profile
November 04, 2012, 06:55:47 PM
 #209

It will almost certainly require a specific implementation in every language clients need. This has dangers, but it's not impossible. Extensive test sets will be needed that have 100% coverage in a reference implementation, to be sure every edge case or weird rebalancing rule is tested.
I think the requirement for "every language" is a vast overkill. I would say from my past experience is sufficient to have a clean portable C implemenation (or C++ implementation in a C-style, without the reliance on the unstable C++ language features like std::* or boost::*). Once that's done in my experience the code can be transcribed to just about anything that is Turing-equivalent.

But such C (or subset-C++) implementation will have to be correctly deal with endianness and alignment issues.

I'm not sure if the core development team is willing to commit to an endian-correct (or endian-neutral) implementation of Bitcoin.

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1135


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
November 04, 2012, 06:56:15 PM
 #210

If you tell a developer, "Now you've got to learn what a Patricia tree is", and then "Now that you've implemented it, you've got to simulate numerous cases of rollbacks to test and feel good that your implementation works backwards as well as forward" you have just made many more developers say "to hell with it, I'll develop something else".

This sort of a developer is dangerous to have developing Bitcoin's internal structures. I think it's better they stay away.


Which sort of developer?  The one who revels in complexity, as though complexity breeds integrity?  This guy is surely already busy on his first implementation of Bitcoin, in assembly language.  He'll be done by 2017, assuming the architecture he's developing for is still popular enough that people will be able to run it.

Or do you mean the one who walks away?  And this benefits bitcoin because the fewer clients, the better?

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
November 04, 2012, 07:00:58 PM
 #211

By recommending a binary tree, or rather, a Merkle tree in the form most resembling a binary tree, I am suggesting that from block to block, the miner has the option of either just updating the tree (in a well-defined deterministic manner but leaving it unbalanced) or updating and rebalancing the tree such that all of the leaf nodes are the same distance (or distance-1) from the root, again in a well-defined deterministic manner.

I am not suggesting leaving the implementation up to the STL or any other form of library.

I don't believe Patricia trees are "simpler" when measured in the amount of human learning and neurons one must dedicate to understanding the concept.  That doesn't mean I think it's too hard to learn, but rather, I doubt the cost (measured in terms of complexity of the specification) is worth the benefit.

If you tell a developer, "Now you've got to learn what a Patricia tree is", and then "Now that you've implemented it, you've got to simulate numerous cases of rollbacks to test and feel good that your implementation works backwards as well as forward" you have just made many more developers say "to hell with it, I'll develop something else".

... not to mention implementing a chain reorg strategy consisting of "now talk to your peers and start asking them for now-orphaned blocks (hopefully they have them still) so you can roll your tree back intact" rather than starting with a copy of the tree at or before the point in time of the split and rolling it forward.

Either way, the developer has to get into the implementation details of the data structure.  They have to understand it.  And really, neither structure is particularly complicated.  Perhaps some devs are more familiar with BSTs.  But to say that a miner "has the option" to rebalance -- that doesn't make sense.  Any rebalancing operation on a BST will change the root hash.  It must be determined from the start exactly when and how rebalance ops will happen.  Or else everyone gets different answers.

And as for the comment about "simulating numerous cases of rollbacks" -- This case is dramatically simpler with a patricia tree structure -- you just add and remove elements from the tree using standard insert & delete operations.  It doesn't get much simpler than that (besides maybe keeping around the last few blocks worth of deleted TxOuts, which is probably a few kB).  On the other hand, you may be talking about gigabytes of data to store "backups" or "snapshots" of the BST, just in order to accommodate the possibility of a rollback.  And how many copies do you need to store?  You can keep the last state of the tree, but what if there's a 2-block reorg?  Well, now you need two copies.  To handle arbitrary-sized rollbacks, you could really be thrashing your hard-drive, and in such a way that everything has changed while you were attempting to swap gigabytes of data around.

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1135


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
November 04, 2012, 07:06:04 PM
 #212

Either way, the developer has to get into the implementation details of the data structure.  They have to understand it.  And really, neither structure is particularly complicated.  Perhaps some devs are more familiar with BSTs.  But to say that a miner "has the option" to rebalance -- that doesn't make sense.  Any rebalancing operation on a BST will change the root hash.  It must be determined from the start exactly when and how rebalance ops will happen.  Or else everyone gets different answers.

I will clarify.  For every block, given the set of transactions contained in that block, there are 2 potential hash values that are acceptable as the root hash.  One of them represents the tree with the transactions applied to them.  This case is checked first, because it's the least expensive for a client to do so.  The second one represents the tree after it has been completely rebalanced.  A client should have no problem determining which way it went simply by trying the first case, and then the second.

And as for the comment about "simulating numerous cases of rollbacks" -- This case is dramatically simpler with a patricia tree structure -- you just add and remove elements from the tree using standard insert & delete operations.  It doesn't get much simpler than that (besides maybe keeping around the last few blocks worth of deleted TxOuts, which is probably a few kB).  On the other hand, you may be talking about gigabytes of data to store "backups" or "snapshots" of the BST, just in order to accommodate the possibility of a rollback.  And how many copies do you need to store?  You can keep the last state of the tree, but what if there's a 2-block reorg?  Well, now you need two copies.  To handle arbitrary-sized rollbacks, you could really be thrashing your hard-drive, and in such a way that everything has changed while you were attempting to swap gigabytes of data around.

How do you envision rolling the tree back in the case where you have just determined that all of the blocks you have on hand are now invalid, and getting the now-correct state of the Patricia meta tree requires you to ask peers for orphaned blocks you don't have?  Must future Bitcoin implementations be required to keep orphan blocks on hand and serve them to peers to support the ability of others to roll their tree backwards?

In perspective, it's not the idea of Patricia or any other kind of tree I am having a problem with, it's the added complexity of supporting this sort of roll back that goes well beyond understanding a new kind of tree.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
jojkaart
Member
**
Offline Offline

Activity: 97
Merit: 10


View Profile
November 04, 2012, 07:08:50 PM
 #213

This sort of a developer is dangerous to have developing Bitcoin's internal structures. I think it's better they stay away.
Which sort of developer?  The one who revels in complexity, as though complexity breeds integrity?  This guy is surely already busy on his first implementation of Bitcoin, in assembly language.  He'll be done by 2017, assuming the architecture he's developing for is still popular enough that people will be able to run it.

Or do you mean the one who walks away?  And this benefits bitcoin because the fewer clients, the better?

No, the developer you described clearly has no patience to test his code so that it works properly. We're better off without such developers.
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1135


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
November 04, 2012, 07:15:11 PM
Last edit: November 04, 2012, 07:32:31 PM by casascius
 #214

And as for the comment about "simulating numerous cases of rollbacks" -- This case is dramatically simpler with a patricia tree structure -- you just add and remove elements from the tree using standard insert & delete operations.  It doesn't get much simpler than that (besides maybe keeping around the last few blocks worth of deleted TxOuts, which is probably a few kB).  On the other hand, you may be talking about gigabytes of data to store "backups" or "snapshots" of the BST, just in order to accommodate the possibility of a rollback.  And how many copies do you need to store?  You can keep the last state of the tree, but what if there's a 2-block reorg?  Well, now you need two copies.  To handle arbitrary-sized rollbacks, you could really be thrashing your hard-drive, and in such a way that everything has changed while you were attempting to swap gigabytes of data around.

I don't believe the snapshots of the tree would be gigabytes.  I mentioned previously a simple scheme where each snapshot of the tree has a lifetime of 2^n blocks, where n is the number of binary zeroes the block height ends with.  So if the current block height is 0x12345, then you can expect to be storing the trees for 0x12344, 0x12340, 0x12300, 0x12200, 0x12000, 0x10000, and 0x0.  So you have a way to restore and roll forward any rollback simply by requesting blocks from peers as is already supported.

To address the specific case of "what about a 2-block reorg", at 0x12345, you'd use the backup at 0x12340 and move forward.

EDIT: Using 2^(n+1) might be better so that (for example) upon reaching block 0x20000, a two-block reorg does not require a fetch all the way back from 0x10000.

So, at 0x12345, using 2^(n+1) you would have: 0x12344, 0x12342, 0x12340, 0x12338, 0x12300, 0x12280, 0x12200, 0x12100, 0x12000, 0x11000, 0x10000, 0x8000, and 0x0.

At 0x20000 instead of just 0x10000 and 0x0, you would have: 0x1ffff, 0x1fffe, 0x1fffc, 0x1fff8, 0x1fff0, 0x1ffe0, 0x1ffc0, 0x1ff80, 0x1ff00, 0x1fe00, 0x1fc00, 0x1f800, 0x1f000, 0x1e000, 0x1c000, 0x18000, 0x10000, and 0x0.  This is sort of the scheme I had in mind when I originally scribbled down 2^n but before actually calculating it out made me realize I'd be storing far less than I was shooting for.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
November 04, 2012, 07:17:03 PM
 #215

This sort of a developer is dangerous to have developing Bitcoin's internal structures. I think it's better they stay away.
Which sort of developer?  The one who revels in complexity, as though complexity breeds integrity?  This guy is surely already busy on his first implementation of Bitcoin, in assembly language.  He'll be done by 2017, assuming the architecture he's developing for is still popular enough that people will be able to run it.

Or do you mean the one who walks away?  And this benefits bitcoin because the fewer clients, the better?

No, the developer you described clearly has no patience to test his code so that it works properly. We're better off without such developers.


I'm not sure what you guys are talking about.  If you disagree with meta-chain at all, then state it as such.  

Otherwise, this discussion is about two different mechanisms to achieve the same end result.  My core argument is the the trie-based solution is much less complex overall, easier to implement and get right, and has numerous other benefits -- such as dead-simple rollbacks and the whole thing is parallelizable (different threads/CPUs/servers can maintain different sub-branches of a patricia tree, and report their results can easily be accumulated at the end -- this is not possible with BSTs).  If you want to contribute to this discussion, then please do.




Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1135


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
November 04, 2012, 07:21:15 PM
 #216

This sort of a developer is dangerous to have developing Bitcoin's internal structures. I think it's better they stay away.
Which sort of developer?  The one who revels in complexity, as though complexity breeds integrity?  This guy is surely already busy on his first implementation of Bitcoin, in assembly language.  He'll be done by 2017, assuming the architecture he's developing for is still popular enough that people will be able to run it.

Or do you mean the one who walks away?  And this benefits bitcoin because the fewer clients, the better?

No, the developer you described clearly has no patience to test his code so that it works properly. We're better off without such developers.


The best piece of code is the one that implements a specification whose complexity and edge cases require no testing because they don't exist.  We're better off with such specifications.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
jojkaart
Member
**
Offline Offline

Activity: 97
Merit: 10


View Profile
November 04, 2012, 07:30:26 PM
 #217

I will clarify.  For every block, given the set of transactions contained in that block, there are 2 potential hash values that are acceptable as the root hash.  One of them represents the tree with the transactions applied to them.  This case is checked first, because it's the least expensive for a client to do so.  The second one represents the tree after it has been completely rebalanced.  A client should have no problem determining which way it went simply by trying the first case, and then the second.

The problem here is that the full rebalancing operation requires everyone to run the rebalancing algorithm to even verify it was done correctly. This means it has to be optimized so that even weaker systems are able to do it. Otherwise, there's no point in including the simpler algorithm. However, if you do optimize it that way, then the point of having the simpler algorithm vanishes completely and the whole design ends up simpler by just having the full rebalance algorithm.
2112
Legendary
*
Offline Offline

Activity: 2128
Merit: 1060



View Profile
November 04, 2012, 07:33:02 PM
 #218

Otherwise, this discussion is about two different mechanisms to achieve the same end result.  My core argument is the the trie-based solution is much less complex overall, easier to implement and get right, and has numerous other benefits -- such as dead-simple rollbacks and the whole thing is parallelizable (different threads/CPUs/servers can maintain different sub-branches of a patricia tree, and report their results can easily be accumulated at the end -- this is not possible with BSTs).  If you want to contribute to this discussion, then please do.
Again, the claim "this is not possible with BSTs" about impossibility of parallelism in b-trees is false. I wonder what is going on here?

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1135


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
November 04, 2012, 07:36:30 PM
 #219

The problem here is that the full rebalancing operation requires everyone to run the rebalancing algorithm to even verify it was done correctly. This means it has to be optimized so that even weaker systems are able to do it. Otherwise, there's no point in including the simpler algorithm. However, if you do optimize it that way, then the point of having the simpler algorithm vanishes completely and the whole design ends up simpler by just having the full rebalance algorithm.


I can't imagine that the rebalancing algorithm is going to be costlier in CPU time than validating sets of ECDSA signatures on incoming blocks as is already required.

The most expensive operation in rebuilding the tree is SHA256 hashing.  We're doing quadrillions of these hashes network-wide every 10 minutes.  What's a few million more per node, once every 10 minutes?

I can see wanting to avoid making every miner node rebalance the tree in response to every incoming transaction (i.e. every roll of Satoshi Dice) - this would motivate miners to mine empty blocks, and the reason for having the simpler option.  But a possibility of doing a rebalance capped with a maximum of once per block sounds plenty realistic.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
Pieter Wuille
Legendary
*
qt
Offline Offline

Activity: 1072
Merit: 1170


View Profile WWW
November 04, 2012, 07:38:31 PM
Last edit: November 04, 2012, 07:49:18 PM by Pieter Wuille
 #220

Not sure how up-to-date you guys are with development of the reference client, but in the 0.8 validation engine ("ultraprune"), a first step towards ideas like proposed in this thread was taken. It may be interesting in this discussion.

We now do keep an explicit set of unspent transaction outputs, but 1) indexed by txid (as that is necessary for block validation) instead of by address, and 2) without explicit tree structure or authentication. Still, some of it is relevant.

As casascius proposed, it only keeps the currently-active UTXO state, and no effort is done to keep older trees available by sharing subtrees. Keeping full snapshots of older trees is a bad idea in my opinion (and re-rolling the entire history even more so), but on-disk files with data to 'undo' the applications of blocks to the UTXO set are an easy solution. If you see blocks as patches to the UTXO structure, these undo files are the reverse patches. They are only necessary for reorganisations, and as that happens far less frequently than accessing the currently active tree state, they don't need super-fast access anyway (simple rule in optimizing for cache effects: if two data sets have different access patterns, don't mix them).

If the block headers commit to (the hash of) these undo files as well as the merkle root of the current UTXO state, clients could ask their peers for backward movement data, which is enough to release a client stuck in a side chain without transferring the entire UTXO state. I see no reason for not doing this.

So, progressing to the full idea, the first step is adding an authentication/merkle structure on top of the UTXO state. Only a hash in the coinbase is necessary that commits to the current state and the undo data hash to move back to the previous block, is needed. In case of not fully deterministic data structures (like balanced trees, as opposed to tries/patricia trees), the undo data perhaps needs to be extended with structural undo data, to have enough information to roll back to the exact same previous structure.

The last step is extending this to an address-to-txid index, or something equivalent. I don't think this will be hard, if you already have everything above.

PS: giving miners the choice to either rebalance or not is a bad idea, imho, as it leads to increased variation (and worse: variation under control of miners) in block validation. Especially since balancing after one addition is typically O(log n), and a full rebalancing of the entire tree is O(n).

I do Bitcoin stuff.
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1135


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
November 04, 2012, 07:48:48 PM
Last edit: November 04, 2012, 08:06:21 PM by casascius
 #221

As casascius proposed, it only keeps the currently-active UTXO state, and no effort is done to keep older trees available by sharing subtrees. Keeping full snapshots of older trees is a bad idea in my opinion (and re-rolling the entire history even more so), but on-disk files with data to 'undo' the applications of blocks to the UTXO set are an easy solution. If you see blocks as patches to the UTXO structure, these undo files are the reverse patches. They are only necessary for reorganisations, and as that happens far less frequently than accessing the currently active tree state, they don't need super-fast access anyway (simple rule in optimizing for cache effects: if two data sets have different access patterns, don't mix them).

My main thought when weighing this idea versus my "snapshots" idea, is that this idea is asking for a storage burden of little objects that grows O(n) with the block chain and only supports a reorg as long as the record goes, and the one I suggest is of bigger objects that grows only O(log n) and supports a reorg all the way back to the genesis block, without requiring development, support, testing, or specifying any new functionality or requirements for storing and transferring orphan blocks among peers.

EDIT: another concern is that any rollback scheme that depends on the ability to fetch orphan blocks from peers is not only at risk of failing if those peers do not have those orphan blocks, but is also an avenue for attack.  If a node ends up bootstrapping from an isolated segment of the network that has a large number of orphan blocks (or is forced into this situation by an attacker), never seeing any full blocks or tree snapshots from the real network... and then suddenly connects to the "real" network, the real network will have no way to provide the orphan blocks needed to roll his tree out of fantasy land and back to earth.  He must start over from some point in the past, he has no choice.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
Pieter Wuille
Legendary
*
qt
Offline Offline

Activity: 1072
Merit: 1170


View Profile WWW
November 04, 2012, 08:03:26 PM
 #222

My main thought when weighing this idea versus my "snapshots" idea, is that this idea is asking for a storage burden of little objects that grows O(n) with the block chain and only supports a reorg as long as the record goes, and the one I suggest is of bigger objects that grows only O(log n) and supports a reorg all the way back to the genesis block, without requiring development, support, testing, or specifying any new functionality or requirements for storing and transferring orphan blocks among peers.

I'm not sure what you mean. Sure you only needs O(log n) separate objects, while I need O(n) separate objects. However, in total you store way way more data. Currently the "forward" block data is 3.75 GB, and the undo data all the way back to the genesis block is 0.48 GB. The UTXO set itself, is 0.13 GB. So just 4 snapshots requires more storage than all that is needed to store the entire history, without O(n) operations to make the copies. Sorry, but full snapshots is out of the question in my opinion. Both explicit reverse-deltas (like I use) or multiple trees with subtree-sharing have significantly better complexity. And yes, sure it may be somewhat harder to develop, but if Bitcoin is still around when the UTXO set is several gigabytes large, we'll be very thankful that we don't need to make fast copies of it continuously.

I do Bitcoin stuff.
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1135


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
November 04, 2012, 08:20:48 PM
 #223


I'm not sure what you mean. Sure you only needs O(log n) separate objects, while I need O(n) separate objects. However, in total you store way way more data. Currently the "forward" block data is 3.75 GB, and the undo data all the way back to the genesis block is 0.48 GB.

The way I understand it, this 0.48 GB of undo data is useless without part of the 3.75 GB of block data to go with it.  I have assumed this because in order to roll back a block, you need to replace elements of the UTXO set that have been cut out during forward motion, which clearly don't all fit in 0.48 GB.  And in order to roll a block back, you need to download orphaned data from a peer, data that is now no longer part of the block chain, isn't required storage for anybody, which nobody on the main network has if it originated from an attacker or isolated network, and that can't be assumed to be available from any given peer.

Meanwhile, while my snapshots likewise depend on external data to properly roll things forward, it is data that at least a) some nodes on the main network are guaranteed to have, and b) worthwhile downloading, since it is actual block chain data that is relevant to the network and worth storing and propagating.

The UTXO set itself, is 0.13 GB. So just 4 snapshots requires more storage than all that is needed to store the entire history, without O(n) operations to make the copies.

That assumes all four snapshots are the same size.  How big was the block chain around block 0x10000?  Tiny compared to today is my guess, its size is certainly a fraction of 0.13 GB.  The earlier snapshots are likely to always be much smaller than the later ones.

That also assumes that they can't be stored on disk without any sort of differential coding.  Surely that won't help much for storing the snapshots for block 0x8000 and block 0x10000, but the snapshots between blocks in close succession (especially when no rebalance has been done) are easily written as "here's a snapshot" and "here's a snapshot-diff" (and "snapshot-diff" can easily just be defined as the block itself, something the client is probably already saving)

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
Pieter Wuille
Legendary
*
qt
Offline Offline

Activity: 1072
Merit: 1170


View Profile WWW
November 04, 2012, 08:36:50 PM
 #224

The 0.48 GB does include all data ever removed from the txout set in fact, as only a fraction of the block data is actual txouts - most is signatures which never make it to the txout set. However, you are right that the undo data does depend on the forward block data, not for the data itself but for the txids being spent. Compared to data being removed, this is "local" data: to disconnect a block N you need the block data for N, and the undo data for N (and especially not the block data of the transactions that were spent). If the txids would be stored in the undo data as well (to make it fully standalone), it would be around 1.5 GB (yes, seriously, the majority of undo data is not the "values" of the UTXO map, but the "keys").

Still, I think we're moving away from the real discussion. We're not talking about today's storage requirements - if we were nothing complex would be needed, the UTXO set is only 130 MB, and is completely scanned in seconds. I do forefee the UTXO set to grow significantly the next years, and the problem is not how you store it: people should have the ability to store it in a small compact way, or a larger way, by choosing different priorities of cpu/bandwidth/storage.

I do Bitcoin stuff.
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
November 05, 2012, 01:26:51 AM
 #225

Again, the claim "this is not possible with BSTs" about impossibility of parallelism in b-trees is false. I wonder what is going on here?

I should've stated that differently -- it's not impossible to parallelize general BSTs, etc, but since we are dependent on the underlying structure of the BST (which is not a normal requirement of BSTs in other applications), insert order must be strictly adhered to, and thus each insert operation must be completed in order before the next can be applied.   By definition, that is a serial process.  Along with that, sub-branches of the BST are not independent, so it's more complicated to even have the storage space split up, since rebalance operations may shift nodes from one thread's storage space to another.  It's not an impossible task, it's just an unnecessarily complex one.

And I have a tough time believing that no one in the future will ever benefit from sub-tree independence:  with the BST, lite nodes can't even maintain their own little sub-trees for their addresses, because the structure of that subtree could change due to unrelated inserts nearby.  Or, the subtree could induce a rebalance itself, and change the root of the subtree that has to be tracked which may include other nodes it doesn't care about.

With the trie structure, sorted first by address then by OutPoint, a lite node can maintain sub-trees of each of its own addresses, insert and delete elements in any order, and roll back its subtree on reorgs without having to care about anyone else's OutPoints.  And all using simple, standard insert and delete ops, found in any textbook on the subject.

But, I have to agree that the first implementer of this stuff is likely to set this standard.  I was hoping to persuade the folks who are working on it now, that the trie structure is not only the simplest, but the most flexible and robust.  Apparently, I'm not succeeding.    I better get back to fixing Armory so that maybe one day soon I can work on this problem, too Smiley

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
jl2012
Legendary
*
Offline Offline

Activity: 1792
Merit: 1087


View Profile
November 05, 2012, 04:34:03 AM
 #226

I have a temporary solution. Currently the satoshi client has hard code of historical block hash (currently at about height 190000). Could we also hard code all the unspent output to the satoshi client up to a certain block height? For each output, only the transaction id, output order, script, and the block height are needed. That would be all information needed for verifying any future transactions and spending the coins (block height is needed to calculate the priority. If these unspent outputs are deep enough in the chain, the block height may be omitted as well).

Since the users can verify the hard-coded data with the blockchain, they don't really need to trust the development team. If the hard-coded data is widely distributed and independently verified by many people, normal users may simply accept it as-is.

Donation address: 374iXxS4BuqFHsEwwxUuH3nvJ69Y7Hqur3 (Bitcoin ONLY)
LRDGENPLYrcTRssGoZrsCT1hngaH3BVkM4 (LTC)
PGP: D3CC 1772 8600 5BB8 FF67 3294 C524 2A1A B393 6517
jgarzik
Legendary
*
qt
Offline Offline

Activity: 1596
Merit: 1091


View Profile
November 05, 2012, 06:25:03 PM
 #227

I have a temporary solution. Currently the satoshi client has hard code of historical block hash (currently at about height 190000). Could we also hard code all the unspent output to the satoshi client up to a certain block height?

Yes.

Quote
For each output, only the transaction id, output order, script, and the block height are needed.

You just need a hash of the UTXO set at a given height, really.  Then later UTXO sets may be provably traced to the checkpoint hash.


Jeff Garzik, Bloq CEO, former bitcoin core dev team; opinions are my own.
Visit bloq.com / metronome.io
Donations / tip jar: 1BrufViLKnSWtuWGkryPsKsxonV2NQ7Tcj
d'aniel
Sr. Member
****
Offline Offline

Activity: 461
Merit: 251


View Profile
December 12, 2012, 05:26:08 AM
Last edit: December 18, 2012, 09:19:48 PM by d'aniel
 #228


The "values" of each leaf is just the root of the sub tree, and the value of each branch is the skip-string concatenated with all its children's values.





I was thinking that to reduce the number of hashes a lightweight client would need to download when proving inclusion, you could replace the concatenation of the children's values in the formula for the value of branch node B with their Merkle root.  Then a lightweight client would only need log2(256) = 8 hashes 8 + (8 - 1) = 15 hashes per node to prove inclusion in the worst case scenario, instead of 256.

The downside is having to store more hashes (twice as many, worst case), but it actually appears to make updates faster, I guess due to fewer hashes needing to be accessed from memory.  This is because all node values updated during insertion/deletion/leaf update, except the one being attached to or removed from, have only a single leaf in the Merkle tree updated => no unbalancing issues, and only a single Merkle branch needs to be updated.

Also, this paper, 'Implementation of an Authenticated Dictionary with Skip Lists and Commutative Hashing' seems to offer an alternative solution than trees/tries: http://www.cs.brown.edu/cgc/stms/papers/discex2001.pdf
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
December 12, 2012, 06:26:00 AM
 #229


I was thinking that to reduce the number of hashes a lightweight client would need to download when proving inclusion, you could replace the concatenation of the children's values in the formula for the value of branch node B with their Merkle root.  Then a lightweight client would only need log2(256) = 8 hashes per node to prove inclusion in the worst case scenario, instead of 256.

The downside is having to store more hashes (twice as many, worst case), but it actually appears to make updates faster, I guess due to fewer hashes needing to be accessed from memory.  This is because all node values updated during insertion/deletion/leaf update, except the one being attached to or removed from, have only a single leaf in the Merkle tree updated => no unbalancing issues, and only a single Merkle branch needs to be updated.

That's a very good idea, as I've been concerned about how to reduce the number of hashes that need to be transferred for a lite-node to get its balance.  I had spent some time looking for "associative" hashing functions.  If they existed, the transfers would be tiny:  if a node is known to have Hash(A|B|C|D|E|F|G|H|I|J), and you want to prove inclusion of G, you simply supply {Hash(A|B|C|D|E|F), G, Hash(H|I|J)}.   So you would need to transfer at most 3 hashes per level.  Unfortunately, the only associative hash function I found was based on matrix multiplications, and was most definitely not cryptographically secure Sad

So, I welcome the idea of per-node merkle trees.  I wonder if the trees really need to be stored, though.  Hashing is so fast that recomputing the merkle tree on-demand may be fine, especially for the sparse, lower-level nodes.  Of course, the top couple levels could/should be cached since they'll get hit all the time and are pretty dense.

However, I'm not sure if I agree/understand the point about updating only Merkle branches.  Because the linked-lists at each node in the tree are sorted, the deletion/insertion of a node is likely to occur in the middle of the list and reverse the parity/pairings -- i.e.  you started with {A,C, D,H, J,M, Q,Y} -- the bottom level pairs (A,C), (D,H), (J,M) and (Q,Y).  Now, you insert E and the list now looks like: {A,C  D,E, H,J, M,Q, Y,Y} which means that all branches to the right of the insertion or deletion need to be recalculated.  

On the other hand, you may be recomputing sparse nodes all the time, anyway (meaning this recomputation will happen regardless), and the dense higher-level nodes could be batch-updated -- you know that a given block is going to modify 77 of your 256 branches at the top level, so you don't need to recompute the tree until you have all 77 children are complete.


Also, this paper, 'Implementation of an Authenticated Dictionary with Skip Lists and Commutative Hashing' seems to offer an alternative solution than trees/tries: http://www.cs.brown.edu/cgc/stms/papers/discex2001.pdf

That's an interesting paper, though I have trouble envisioning how they could be made deterministic for the purposes of authenticated structures.  Maybe I just need to read the paper a little closer.  Unfortunately, it's late so I will have to come back to this, later.

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
d'aniel
Sr. Member
****
Offline Offline

Activity: 461
Merit: 251


View Profile
December 12, 2012, 01:58:33 PM
Last edit: December 12, 2012, 02:32:10 PM by d'aniel
 #230

That's a very good idea, as I've been concerned about how to reduce the number of hashes that need to be transferred for a lite-node to get its balance.  I had spent some time looking for "associative" hashing functions.  If they existed, the transfers would be tiny:  if a node is known to have Hash(A|B|C|D|E|F|G|H|I|J), and you want to prove inclusion of G, you simply supply {Hash(A|B|C|D|E|F), G, Hash(H|I|J)}.   So you would need to transfer at most 3 hashes per level.  Unfortunately, the only associative hash function I found was based on matrix multiplications, and was most definitely not cryptographically secure Sad
Funny, I was looking for an associative hashing function as well.  I gave up after thinking they would seem to defeat the purpose of Merkle trees altogether, and thus likely don't exist (yet?).

Quote
However, I'm not sure if I agree/understand the point about updating only Merkle branches.  Because the linked-lists at each node in the tree are sorted, the deletion/insertion of a node is likely to occur in the middle of the list and reverse the parity/pairings -- i.e.  you started with {A,C, D,H, J,M, Q,Y} -- the bottom level pairs (A,C), (D,H), (J,M) and (Q,Y).  Now, you insert E and the list now looks like: {A,C  D,E, H,J, M,Q, Y,Y} which means that all branches to the right of the insertion or deletion need to be recalculated.  
My point was that for insertion/deletion, only the trie node being attached to/removed from requires an insertion/deletion in the leaves of its Merkle tree.  All of this node's parents simply have to update the value of one of the leaves in the Merkle trees, i.e. no insertions/deletions, and this requires only updating a single branch in each parent's Merkle tree.  The node being attached to/removed from does require an insertion/deletion in its Merkle tree, but this would usually happen lower down the trie, where nodes branch less, and Merkle tree rebalancing is faster.

Quote
Also, this paper, 'Implementation of an Authenticated Dictionary with Skip Lists and Commutative Hashing' seems to offer an alternative solution than trees/tries: http://www.cs.brown.edu/cgc/stms/papers/discex2001.pdf

That's an interesting paper, though I have trouble envisioning how they could be made deterministic for the purposes of authenticated structures.  Maybe I just need to read the paper a little closer.  Unfortunately, it's late so I will have to come back to this, later.
To make it deterministic, assuming the hash of each node's element is uniformly distributed, this hash value could be used as a "random" number for determining the tower height.  Though, this provides an avenue for attack by users purposely building lots of "tall towers".  To avoid this, each UTxO could also get randomness from the Merkle root of the tx tree that the UTxO showed up in, something a non-miner attacker has no control over.  It might also be hard enough for even a miner to perform the attack, since he has only one lever to control many degrees of freedom with.  This is just off the cuff, though, so I'm sure there are better ways to do it, or perhaps I'm not understanding it correctly.

I think the thing in this paper is patented, though.

Edit: You should name this beautiful thing, cause "hybrid PATRICIA/de la Brandais trie with authenticating Merkle trees" doesn't exactly roll off the tongue Smiley
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
December 13, 2012, 05:54:15 AM
 #231

My point was that for insertion/deletion, only the trie node being attached to/removed from requires an insertion/deletion in the leaves of its Merkle tree.  All of this node's parents simply have to update the value of one of the leaves in the Merkle trees, i.e. no insertions/deletions, and this requires only updating a single branch in each parent's Merkle tree.  The node being attached to/removed from does require an insertion/deletion in its Merkle tree, but this would usually happen lower down the trie, where nodes branch less, and Merkle tree rebalancing is faster.

Ahh, excellent point.   On that note, isn't it actually 15 hashes per full merkle tree of 256 nodes?  Because you need not only the straight branch from root to leaf, but also the brothers of each of those nodes so you can confirm.  Though, it's still dramatically less than sending all 256, and makes the bandwidth requirements of this structure much more pleasant.


Edit: You should name this beautiful thing, cause "hybrid PATRICIA/de la Brandais trie with authenticating Merkle trees" doesn't exactly roll off the tongue Smiley

How about the "Reiner Tree"?   Tongue  It's funny you asked that, because the other day my dad was asking me if I have any part of Bitcoin named after me, yet...

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1135


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
December 13, 2012, 06:28:11 AM
 #232

How about the "Reiner Tree"?   Tongue  It's funny you asked that, because the other day my dad was asking me if I have any part of Bitcoin named after me, yet...

I get stuff like this too, for being the ultimate bitcoin freak in my circles.

I talk about Bitcoins all the time, am everybody's Bitcoin broker, give away Casascius Coins and paper bitcoins for every occasion, I now have the Utah "BITCOIN" vanity license plate.  People think I'm Satoshi and just not telling, and gifts I have received for birthdays and Christmas are often bitcoin themed, one included a book from the bookstore about "the history of money" spoofed with my coins being part of history: content from my website was mocked up into the style of the book and facetiously glued into the middle of it.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
December 15, 2012, 02:34:10 PM
 #233

I just realized something that will need to be addressed with any such scheme that is used:  how will a soon-to-be-full-validation-but-pruned node bootstrap off the network?  Sure, we can say that at block X, the entire set of UTXOs is well-defined and compact (in whatever tree structure we're talking about).  But any new node that jumps onto the network will have download that entire UTXO set, and surely peers will have updated their internal UTXO set/tree in at least once while the node is downloading.   This means that even if a node can start supplying the UTXO set at block X, within an hour that node will be on X+3, or something like that.  Branches and leaves of the tree will have changed, and that node will not recognize the branches and leaves of the previous state (well, most will be the same, but you get the point).

This is resolved in the main network by having persistent, authenticated information that can be downloaded in chunks (i.e. blocks in the blockchain), which are still valid, no matter how long it's been since that block was created.  Each block can be downloaded quickly, and checked directly against the header chain.  However, in this case, the entire tree is to be authenticated against a block header, and you pretty much have to download the entire tree before you can confirm any part of it.  Seems like this could be a problem...

One idea, which doesn't seem ideal, is that the node simply stores a "snapshot" at every retarget event.  Every new node will have a two week window to download the UTXO set from the latest retarget, and then download the block history that has occurred since then.  If the new node also has a wallet, it can use the absolute latest meta-header to get its own balance and UTXO list and let the user manage their wallet using the still-secure address branches, it just won't be able to fully sync and start full validation until it's done downloading the latest retarget tree.

That's not an ideal solution, but it's food for thought...

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
jgarzik
Legendary
*
qt
Offline Offline

Activity: 1596
Merit: 1091


View Profile
December 15, 2012, 04:32:15 PM
 #234

I just realized something that will need to be addressed with any such scheme that is used:  how will a soon-to-be-full-validation-but-pruned node bootstrap off the network?  Sure, we can say that at block X, the entire set of UTXOs is well-defined and compact (in whatever tree structure we're talking about).  But any new node that jumps onto the network will have download that entire UTXO set, and surely peers will have updated their internal UTXO set/tree in at least once while the node is downloading. 

The prevailing idea is that you download the block chain from "archive nodes", which are nodes that retain the full blockchain.


Jeff Garzik, Bloq CEO, former bitcoin core dev team; opinions are my own.
Visit bloq.com / metronome.io
Donations / tip jar: 1BrufViLKnSWtuWGkryPsKsxonV2NQ7Tcj
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
December 16, 2012, 02:32:01 AM
 #235

I just realized something that will need to be addressed with any such scheme that is used:  how will a soon-to-be-full-validation-but-pruned node bootstrap off the network?  Sure, we can say that at block X, the entire set of UTXOs is well-defined and compact (in whatever tree structure we're talking about).  But any new node that jumps onto the network will have download that entire UTXO set, and surely peers will have updated their internal UTXO set/tree in at least once while the node is downloading. 

The prevailing idea is that you download the block chain from "archive nodes", which are nodes that retain the full blockchain.

I'm not talking about downloading the entire blockchain.  I'm talking about downloading just the pruned UTXO-tree.  Every pruned-full node knows what the UTXO-tree looks like at a given time, but before it can finish telling you what it looks like, it's updating itself with new tx and blocks and changing it.  I'm not seeing a clean way to transfer pruned-blockchain data from a node that is constantly updating its own meta tree.    Ultimately, it's because knowing what the network looked like at block X (from the pruned blockchain perspective), does not mean it's easy/possible to tell me what it looked like at block X-30.  And it's probably a lot of data and complexity to accommodate taking a "snapshot" just for helping other full nodes catch up.

You could say:  well they should just download the entire transaction history and build the UTXO set themselves.  I'm sure some users will do that, either out of paranoia, or for historic reasons, etc.  But it most definitely shouldn't be a requirement to download 100 GB of history just to get 2 GB worth of UTXOs.   The data needed to become a pruned-but-still-full-validation node is a fraction of the entire-and-ever-growing history, and it would be a pretty significant waste of resources to not be able to download the raw UTXO list to get up and running.

As I said, the simplest is probably to have nodes just spend the space on a snapshot at every retarget, and let nodes synchronize with that (or perhaps every 500 blocks or something, as long as all pick the same frequency so that you can download from lots of sources simultaneously).  After that, they can download the few remaining blocks to update their own tree, appropraitely.

This could affect pruned-lite nodes, too.  If there are updated meta-chain hashes every block, then even transferring small "address branches" to lite nodes could be "interrupted" by new tx/blocks coming in that changes the seeder's UTXO-tree before it's finished.


Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
d'aniel
Sr. Member
****
Offline Offline

Activity: 461
Merit: 251


View Profile
December 18, 2012, 09:33:12 PM
 #236

On that note, isn't it actually 15 hashes per full merkle tree of 256 nodes?
Yeah, whoops.

Regarding the issue of synching ones Reiner tree: Smiley is it really a problem this proposal needs to solve?  Couldn't the client just wait to build/update it til after he's caught up with the network in the usual way?
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1135


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
December 18, 2012, 10:16:31 PM
 #237

As I said, the simplest is probably to have nodes just spend the space on a snapshot at every retarget, and let nodes synchronize with that (or perhaps every 500 blocks or something, as long as all pick the same frequency so that you can download from lots of sources simultaneously).  After that, they can download the few remaining blocks to update their own tree, appropraitely.

I had come up with a scheme for deciding how long to keep each snapshot I thought would balance space and usefulness well.

If the block height (in binary) ends in 0, keep it for 4 blocks.
If 00, keep for 8 blocks.
If 000, keep for 16 blocks.
If 0000, keep for 32 blocks.
If 00000, keep for 64 blocks... etc. all the way to the genesis block.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
December 18, 2012, 10:43:03 PM
Last edit: March 03, 2013, 05:24:09 PM by etotheipi
 #238

Is it possible to read somewhere exactly what is stored in a block in pseudocode?

A block, as it is stored on disk is very straightforward and parsed very easily:

Code:
[MagicBytes(4) || BlockSize(4) || RawHeader(80) || NumTx(var_int) || RawTx0 || RawTx1 || ... || RawTxN]

The blk*.dat files are just a list of binary sequences like this

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
December 18, 2012, 10:51:08 PM
 #239

On that note, isn't it actually 15 hashes per full merkle tree of 256 nodes?
Yeah, whoops.

Regarding the issue of synching ones Reiner tree: Smiley is it really a problem this proposal needs to solve?  Couldn't the client just wait to build/update it til after he's caught up with the network in the usual way?

Well, I'm hoping that it will be possible to not need to "catch up with the network" in the current sense.  Certain types of nodes will only care about having the final UTXO set, not replaying 100 GB of blockchain history just to get their 2 GB of UTXO data.  I'd like it if such nodes had a way of sharing these UTXO trees without using too much resources, and without too much complication around the fact that the tree is changing as you are downloading. 

One core benefit of the trie structure is that nodes can simply send a raw list of UTXOs, since insertion order doesn't matter (and thus deleted UTXOs don't need to be transferred).  Sipa tells me there's currently about 3 million UTXOs, so at 36 bytes each, that's about 100 MB to transfer.  There is, of course, the raw transactions with any remaining UTXO that need to be transferred, too -- currently 1.3 million out of about 10million total tx in the blockchain.  So that's probably another few hundred MB.  But still only a fraction of the 4.5 GB blockchain.   


As I said, the simplest is probably to have nodes just spend the space on a snapshot at every retarget, and let nodes synchronize with that (or perhaps every 500 blocks or something, as long as all pick the same frequency so that you can download from lots of sources simultaneously).  After that, they can download the few remaining blocks to update their own tree, appropraitely.

I had come up with a scheme for deciding how long to keep each snapshot I thought would balance space and usefulness well.

If the block height (in binary) ends in 0, keep it for 4 blocks.
If 00, keep for 8 blocks.
If 000, keep for 16 blocks.
If 0000, keep for 32 blocks.
If 00000, keep for 64 blocks... etc. all the way to the genesis block.

That's a generalization of what I proposed:  if(blockheight mod 2016 == 0) {storeSnapshotFor2016Blocks}.  Clearly, the modulus needs to be calibrated...  The problem is these snapshots are very expensive, so we would prefer not to do snapshots at all.  But one may be necessary.  Hopefully not more than that.  Although it would be great if I just overlooked something and we could do this without snapshots at all.

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1135


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
December 18, 2012, 11:31:59 PM
 #240

As I said, the simplest is probably to have nodes just spend the space on a snapshot at every retarget, and let nodes synchronize with that (or perhaps every 500 blocks or something, as long as all pick the same frequency so that you can download from lots of sources simultaneously).  After that, they can download the few remaining blocks to update their own tree, appropraitely.

I had come up with a scheme for deciding how long to keep each snapshot I thought would balance space and usefulness well.

If the block height (in binary) ends in 0, keep it for 4 blocks.
If 00, keep for 8 blocks.
If 000, keep for 16 blocks.
If 0000, keep for 32 blocks.
If 00000, keep for 64 blocks... etc. all the way to the genesis block.

That's a generalization of what I proposed:  if(blockheight mod 2016 == 0) {storeSnapshotFor2016Blocks}.  Clearly, the modulus needs to be calibrated...  The problem is these snapshots are very expensive, so we would prefer not to do snapshots at all.  But one may be necessary.  Hopefully not more than that.  Although it would be great if I just overlooked something and we could do this without snapshots at all.

Yes, other than what you're proposing stores snapshots that are equally balanced for the entirety of the blockchain, and I'm proposing one that is biased towards having more options that are all recent.  My reason for having done so is it seems plausible that someone using a client that tracks only the UTXO set might be given a choice as to how far back in the blockchain history they want to request from peers (e.g. 1 hour? 8 hours? 1day? 1week? 1month? 3 months? 6months? 1year? 2years? 4years? everything?) and it would make sense to put the resources in being able to accommodate selections that looked like that, versus ones like 2weeks? 4weeks? 6weeks? ... 1028weeks? 1030weeks? 1032weeks?  etc.

Of course, if snapshots end up not being the best solution, then I'm all for that as well.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
December 18, 2012, 11:52:04 PM
 #241

Of course, if snapshots end up not being the best solution, then I'm all for that as well.

Well, I am not seeing a way around using snapshots.  I was hoping someone more insightful than myself would point out something simpler, but it hasn't happened yet...

Also, as mentioned earlier, I think snapshots are wildly expensive to store.  I think if a node wants block X and the snapshot is only for block X+/-100, then he can get that snapshot at X and the 100 blocks in between and rewind or fast forward the UTXO tree on his own.  The rewinding and fast-forwarding should be extremely fast once you have the block data. 

Although this does open the question of how nodes intend to use this data.  If it turns out they will want to understand how the blockchain looked at multiple points in time, then perhaps it's worth the effort to store all these snapshots.  If it never happens, then the fast-foward/rewind would be better.  My thoughts on this is:

(1) The gap between snapshots should be considered relative to the size of a snapshot.  My guess is that 100 blocks is smaller than a snapshot, and thus you never need more snapshots than that
(2) Snapshots at various points in time actually won't be that useful, other than helping other nodes download.  These kinda-full-nodes only care about the latest state of the UTXO tree, nothing else.  If you think there's other reasons, please point them out.


Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1135


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
December 18, 2012, 11:58:25 PM
 #242

Of course, if snapshots end up not being the best solution, then I'm all for that as well.

Well, I am not seeing a way around using snapshots.  I was hoping someone more insightful than myself would point out something simpler, but it hasn't happened yet...

Also, as mentioned earlier, I think snapshots are wildly expensive to store.  I think if a node wants block X and the snapshot is only for block X+/-100, then he can get that snapshot at X and the 100 blocks in between and rewind or fast forward the UTXO tree on his own.  The rewinding and fast-forwarding should be extremely fast once you have the block data. 

Although this does open the question of how nodes intend to use this data.  If it turns out they will want to understand how the blockchain looked at multiple points in time, then perhaps it's worth the effort to store all these snapshots.  If it never happens, then the fast-foward/rewind would be better.  My thoughts on this is:

(1) The gap between snapshots should be considered relative to the size of a snapshot.  My guess is that 100 blocks is smaller than a snapshot, and thus you never need more snapshots than that
(2) Snapshots at various points in time actually won't be that useful, other than helping other nodes download.  These kinda-full-nodes only care about the latest state of the UTXO tree, nothing else.  If you think there's other reasons, please point them out.



The operators of the other nodes may care about holding on to a certain amount of history just as a greater assurance they're really dealing with the real block chain.  One could argue that 1 hour or one day worth of history is enough, but if I'm about to start mining and want to do my part to help the network, I might feel more comfortable if I made it a month or a year, especially if I had the bandwidth to spare.  The more information I hold, the more history I'm "seeding" just in case it's needed in the event of a reorg.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
d'aniel
Sr. Member
****
Offline Offline

Activity: 461
Merit: 251


View Profile
December 19, 2012, 12:47:21 AM
 #243

How about this: download whatever non-sychronized UTxO set you can from peers, then start downloading blocks backwards, adding any missing new txouts and removing any that were spent during the download.  Then once you're a few blocks before the time you started the download you could build the tree and make sure it hashes properly.
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
December 19, 2012, 02:41:34 AM
 #244

How about this: download whatever non-sychronized UTxO set you can from peers, then start downloading blocks backwards, adding any missing new txouts and removing any that were spent during the download.  Then once you're a few blocks before the time you started the download you could build the tree and make sure it hashes properly.

@d'aniel:  You might be right that it's possible to reconstruct the tree from an amalgamation of closely related states.  Though, I'm concerned that there's too many ways for that to go wrong.  Let's start a thought-experiment:  I have a fresh slate, with no Tx and no UTXO.  I then execute 65,536 requests for data, and download each one from a different peer (each request is for a different 2-byte prefix branch).  I will assume for a moment that all requests execute successfully, and we end up with something like the following:



A couple notes/assumptions about my drawing:  

  • (1) I have drawn this as a raw trie, but the discussion is the same (or very close to the same) when you transition to the Patricia/Brandais hybrid.  Let me know if you think that's a bad assumption.
  • (2) We have headers up to block 1000.  So we ask one peer that is at block 1000 for all 65,536 trie-node hashes.  We verify it against the meta-chain header.
  • (3) We make attempts to download all 65,536 subtrees from a bunch of peers, and end up mostly with those for block 1000, but a few for 995-999, and a couple have given us block 1001-1002 because that was what they had by the time we asked them to send us that branch.  We assume that peers tell us what block they are serving from.
  • (4) Some branches don't exist.  Even though the second layer on the main network they will always be at 100% density, there may be various optimization-related reasons to do this operation at a lower branch level where it's not 100% density.
  • (4a) I've used green to highlight four situations that I don't think are difficult, but need to be aware of them.  Branch \x0202 is where the node hashes at block 1000 say it's an empty node, but is reported as having data by the peer serving us from block 1001.  \x0203 is the same, but with a peer serving block 993 telling us there is data there.  \x0302 and \x0303 are the inverse:  block 1000 has hashes for those trie-nodes, but when requested from peers serving at other points in time, they report empty.
  • (5) Downloading the transactions-with-any-unspent-txouts from sources at different blocks also needs to be looked at.  We do eventually need to end up with a complete list of tx for the tree at block 1000 (or 1002?).  I'm expecting that any gaps can be filled with subsequent requests to other nodes.

So, as a starter algorithm, we acquire all this data and an almost-full UTXO tree.  We also acquire all of the blocks between 993 and 1002.   One branch at a time, we fast forward or rewind that branch based on the tx in blocks 993-1002.  It is possible we will be missing block data needed (due to #5), but I assume we will be able to acquire that info from someone -- perhaps this warrants keeping tx in the node's database for some number of blocks after it is depleted, to make sure it can still be served to other nodes catching up (among other reasons).

On the surface, this looks workable and actually not terribly complicated.  And no snapshots required!  Just ask peers for their data, and make sure you know what block their UTXO tree is at.  But my brain is at saturation, and I'm going to have to look at this with a fresh set of eyes later this week, to make sure I'm not neglecting something stupid.

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
d'aniel
Sr. Member
****
Offline Offline

Activity: 461
Merit: 251


View Profile
December 19, 2012, 04:25:13 AM
 #245

I love your inkscape graphics.  I downloaded it cause of your mention Smiley

Doesn't it make more sense to start downloading from the bottom of the tree instead of the top?  Say, partition the address space up, and request all UTxOs that lie a given partition - aong with the full bounding branches - and then compute the missing node hashes up to the root.  Inclusion of each partition in some known block is verified, and then we'd just have catch up the delayed partitions separately using full tx data.  The deterministic property of the tree makes the partition syncing trivial, and I assume tx data will be available within some relatively large time window for reorgs and serving, etc.

My brain is fried right now too, I'll have a closer look at what you wrote after some sleep.  Maybe I'm oversimplifying it...
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
December 19, 2012, 04:55:23 AM
Last edit: December 20, 2012, 06:36:55 PM by etotheipi
 #246

I love your inkscape graphics.  I downloaded it cause of your mention Smiley

It's like VIM:  it's got a bit of a learning curve to be able to use it efficiently, but there's so many shortcuts and hotkeys that you can really fly once you have some experience (and yes, I do 100% of my code development in vim Smiley)

Doesn't it make more sense to start downloading from the bottom of the tree instead of the top?  Say, partition the address space up, and request all UTxOs that lie a given partition - aong with the full bounding branches - and then compute the missing node hashes up to the root.  Inclusion of each partition in some known block is verified, and then we'd just have catch up the delayed partitions separately using full tx data.  The deterministic property of the tree makes the partition syncing trivial, and I assume tx data will be available within some relatively large time window for reorgs and serving, etc.

My brain is fried right now too, I'll have a closer look at what you wrote after some sleep.  Maybe I'm oversimplifying it...

I think we're saying the same thing:  I showed partitioning at the second level, but it really would be any level low enough to meet some kind of criteria (though I'm not sure what that criteria is, if you don't have any of the data yet).  It was intended to be a "partitioning from the bottom" and then filling up to the root once you have it all.  

I imagine there would be a P2P command that says "RequestHashes | HeaderHash | Prefix".  If you give it an empty prefix, that means start at root:  it will give you the root hash, followed by the 256 child hashes.  If you give it a prefix "\x01" it gives you the hash of the node starting at '\x01' and the hashes of its 256 children. This is important, because I think for this to work, you have to have a baseline for what the tree is going to look like for your particular target block.  I think it gets significantly more complicated if you are aiming for partitions that are from different blocks...

Then there'd be another command that says "RequestBranch | HeaderHash | Prefix | StartNode".  The header hash/height would be included only so that peers that are significantly detached from your state won't start feeding you their data.  i.e. Maybe because they don't recognize your hash, or somehow they are more than 100 blocks from the state you are requesting.  If the peer's state is within 100 blocks, they start feeding you that partition, ordered lexicographically.  They'll probably be transferred in chunks of 1000 nodes, and then you put in the next request using the 1000th node as the start node to get the next chunk.  Since we have branch independence and insert order independence, the transfer should be stupid simple.

Also something to note:  I think that the raw TxOut script should be the key for this tree.  Sure, a lot of these will have a common prefix, but PATRICIA trees will compress those anyway.  What I'm concerned about is something like multiple variations of the same address, such as a TxOut using hash160 vs using full public key.  That can lead to stupid side-effects if you are only requesting by addr.  

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
January 02, 2013, 11:37:44 PM
 #247

Just a random update:  I was updating the top post and want a check on my math for the question:

"In a post-Reiner-Tree world, how much data must a lite-node download to know its balance and spend its coins, from scratch?"

Okay, so, I will assume a wallet has 100 addresses each holding 2 UTXOs and no information/history. 

Pre-Req: Need the headers for the main chain and meta-chain:  meta-chain doesn't exist yet, but let's assume this is 2 years after implementation of meta chain (assume starting now).  There will be about 24 MB of main chain headers, and 8 MB of meta-chain.  Total:  about 30 MB.  (NOTE: If this is eventually integrated into the main-chain coinbase headers (at the risk of invalid blocks if it's wrong), then there is no extra data to download.

Also note, this data may simply be bundled with the app, so it doesn't really feel like a "download" to them.  It will be longer to download the app, but start up basically instantly.  For the remainder of this discussion: I assume that we've already downloaded the headers.

Balance verification info:  Before you even download your own balance and UTXO list, you need to collect and verify the tree information pointing to your addresses.  Once you have that, you can download the data itself and know you got the right thing.

Here I assume d'aniel's idea about merkle trees at each level.  That means the for a single address, I need to download 15 hashes at each tree level to verify that tree-node:  480 bytes per node.  For efficiency, I'll always just download all 256 hashes at the top level (8kB) and the 480 bytes at each subsequent level. 

I assume that the network is "big", with trillions of UTXOs, thus having to go down 6 levels to get to my address.
I assume that all 100 addresses have a different starting byte, maximizing the data I have to download by not sharing any tree-nodes below the first level. 
So the amount of data I download is Top Level (8kB) + 5 more levels (480 bytes * 5 levels * 100 addresses).

Total verification data:  ~250 kB

UTXO data itself:  After you get the verification data, you need the actual UTXOs.  We assume 100 addr * 2UTXOs = 200 UTXOs.  Each UTXO consists of its 36-byte OutPoint followed by its value (8B) and TxOut script (~26 B).  So 35 bytes/UTXO * 200 UTXOs = 7 kB.

Total:  30 MB for header data downloaded once.  After that, you can throw away all wallet information and re-download on every application restart for 250 kB/100 addresses.

As I look at this, I realize that this is pretty darned conservative:  since the individual tree-nodes are compressed, the last 2-3 levels will require only 3-7 hashes each, not all 15.   So probably more like 150-200 kB to 200 UTXOs.

Am I neglecting anything?  This seems pretty darned reasonable!

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
d'aniel
Sr. Member
****
Offline Offline

Activity: 461
Merit: 251


View Profile
January 03, 2013, 12:48:18 AM
 #248

Am I neglecting anything?
Maybe just that for privacy reasons, instead of a request for txouts for a specific address, a lightweight client would probably want to submit a bloom filter to its peer that would yield enough false positives to obfuscate his ownership of the address.
blueadept
Full Member
***
Offline Offline

Activity: 225
Merit: 101


View Profile
January 03, 2013, 02:36:22 PM
 #249

I may have missed you saying it, but in downloading the UTXO data, would it be better to download the entire transaction instead of just the OutPoint? That way, the client can verify the transaction actually matches the hash, and in combination with the header/PoW info and the Merkle branch, that fully authenticates the OutPoint.

Edit: never mind, I see it now. Under the assumption that the validity of the meta chain is enforced as part of the main chain's block validation rules, the meta chain's authentication is enough to verify the validity of the OutPoint.

Like my posts?  Connect with me on LinkedIn and endorse my "Bitcoin" skill.
Decentralized, instant off-chain payments.
d'aniel
Sr. Member
****
Offline Offline

Activity: 461
Merit: 251


View Profile
January 17, 2013, 11:25:02 AM
 #250

I was thinking rather than use a Merkle tree at each trie node for verifying the presence of a child, we could use a bitwise trie instead, since it would have faster updates due to always being balanced and not having to sort.  This would also speed up lookups, since we wouldn't have to traverse through a linked list of pointers at each node.

But then we've just arrived in a roundabout sort of way at the whole trie being essentially a bitwise trie, since the nodes in the original trie with 256 bit keysize overlap exactly with subtries of the bitwise trie.

Was there a reason for not proposing a single bit keysize from the start?  I thought about it a while back, but disregarded it for some reason I can't recall.
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
January 17, 2013, 09:04:07 PM
 #251

I was thinking rather than use a Merkle tree at each trie node for verifying the presence of a child, we could use a bitwise trie instead, since it would have faster updates due to always being balanced and not having to sort.  This would also speed up lookups, since we wouldn't have to traverse through a linked list of pointers at each node.

But then we've just arrived in a roundabout sort of way at the whole trie being essentially a bitwise trie, since the nodes in the original trie with 256 bit keysize overlap exactly with subtries of the bitwise trie.

Was there a reason for not proposing a single bit keysize from the start?  I thought about it a while back, but disregarded it for some reason I can't recall.

I thought about the same thing, and disregarded it for some reason, as well.  I don't remember though.  Though, working with raw bits is a bit more complicated than bytes, especially when you have skip strings that are like 13 bits.  But I see the benefit that you don't even really need linked lists at each node, only two pointers.  But you end up with a lot more total overhead, since you have the trienode overhead for so many more nodes...

I'll have to think about this one.  I think there is clearly a benefit to a higher branching factor:  if you consider a 2^32-way Trie, there is a single root node and just N trienodes -- one for each entry (so it's really just a lookup table, and lookup is fast).  If you have a 2-way (bitwise) Trie, you still have N leaf nodes, but you have a ton of other intermediate nodes and all the data that comes with them.  And a lot more pointers and "hops" between nodes to get to your lefa.  It leads me to believe that you want a higher branching factor, but you need to balance against the fact that the branching adds some efficiency (i.e. in the case of using linked lists between entries, it would obviously be bad to have a branching factor of 2**32).


Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
d'aniel
Sr. Member
****
Offline Offline

Activity: 461
Merit: 251


View Profile
January 18, 2013, 02:51:23 AM
Last edit: January 18, 2013, 09:44:52 AM by d'aniel
 #252

But I see the benefit that you don't even really need linked lists at each node, only two pointers.  But you end up with a lot more total overhead, since you have the trienode overhead for so many more nodes...
More specifically, it will need somewhat less than double the amount of nodes, since every node in a 2-way trie has two children - i.e. the same number of extra nodes as the Merkle tree, which we were going to need to include anyway.

Quote
I'll have to think about this one.  I think there is clearly a benefit to a higher branching factor:  if you consider a 2^32-way Trie, there is a single root node and just N trienodes -- one for each entry (so it's really just a lookup table, and lookup is fast).  If you have a 2-way (bitwise) Trie, you still have N leaf nodes, but you have a ton of other intermediate nodes and all the data that comes with them.  And a lot more pointers and "hops" between nodes to get to your lefa.  It leads me to believe that you want a higher branching factor, but you need to balance against the fact that the branching adds some efficiency (i.e. in the case of using linked lists between entries, it would obviously be bad to have a branching factor of 2**32).
Wouldn't the bitwise trie actually require fewer hops, since it doesn't need to traverse through the linked list?  You seem to be saying this and at the same time saying it requires more Smiley

Assuming we were going to be serializing the original nodes and keying them individually in a database, going bitwise shouldn't affect this at all, since we would just prune off a bunch of crap to "zoom out" to a 256-way "macro-node", and conversely build in the details of a macro-node grabbed from storage to "zoom back in" to the bitwise subtrie.

Estimate of the amount of extra overhead this proposal will impose on existing clients

Since the network is switching to using a leveldb store of utxos now, we can easily make this estimate.  I'll ignore for now the fact that we're actually using a tree of tries.  Assuming we're actually storing 256-way macro-nodes, and that the trie is well-balanced (lots of hashes, should be), then for a set of N = 256^L utxos, a branch will have L macro-nodes plus the root to retrieve from/update to disk, instead of just the utxo like we do now.  But it makes sense to cache the root and first level of macro-nodes, and have a "write-back cache policy", where we only do periodic writes, so that the root and first level writes are done only once per batch.  So for a batch size >> 256, we have more like L - 1 disk operations to do per utxo retrieval/update, or L - 2 more than we do now.  That's only one extra disk operation per retrieval/update with L = 3, or N = ~17M utxos total.  Due to the extreme sparsity, it probably doesn't make much sense to permanently cache beyond the first level of macro-nodes (16 levels of bitwise nodes ~2*2^16*(2*32 + 2*8 ) bytes (2*2^16 nodes, 32 bytes per hash, 8 bytes per pointer)  ~ 10MB of permanently cached data), since any macro-nodes in the next level can be expected to be accessed during a batch of, say 10,000 writes, roughly 1/256^2 * 10,000 ~ 0.15 times.  So additional memory requirements should stay pretty low, depending on how often writes are done.

I guess to do it properly and not assume the trie is perfectly balanced, we would have a "LFU cache policy", where we track the frequency with which macro-nodes are accessed, and throw out during writes the least frequently used ones, keeping some desired number of those most frequently used.

Disk operations are almost certainly the bottleneck in this proposal, but they aren't in the main client, so it's possible that this wouldn't add much in the way of noticeable overhead.

Is my math correct?  (There was a small mistake I fixed in an edit.)
d'aniel
Sr. Member
****
Offline Offline

Activity: 461
Merit: 251


View Profile
January 18, 2013, 06:36:02 AM
Last edit: January 18, 2013, 09:38:20 AM by d'aniel
 #253

Am I correct in thinking that the skip string and child keys don't really need to be hashed in at each step running up a Patricia trie, since the hash stored at the end of a given branch is of the full word formed by that branch, and inclusion of that hash in the root is proved regardless?

This would simplify the authentication structure a fair bit, allowing us to avoid having to hash weird numbers of bits, and make it not depend on whether or not we're compressing strings of single child nodes with skip strings (I don't know if this would ever be an advantage).
TierNolan
Legendary
*
Offline Offline

Activity: 1232
Merit: 1077


View Profile
February 12, 2013, 05:49:51 PM
 #254

Have you considered that since the hashes are well distributed, you could mostly skip tree balancing.

Basically a PATRICIA-brandais tree without the skip function.  The odds of it helping is pretty low and it adds complexity.

In your trees example, the benefit is that the right hand nodes end up with lower depth.  However, collisions on a sequence of bytes is pretty unlikely, unless you are near the root, and it will be dense there anyway, so you don't get the benefit.

You could also drop the linked list and replace it with a hashmap.  However, most nodes (near the root) will be fully loaded, so maybe better to just use a PATRICIA tree directly.

A compromise would be to have 2 node types.  A node could use an (expanding) hashmap, until it gets to 50% full and then switch to a flat byte array.  However, that is an efficiency thing, and not part of the tree structure.

So, insert a node and then use its bits to determine where to place it, until it gets to a bit that doesn't match any other nodes.

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF
Peter Todd
Legendary
*
expert
Offline Offline

Activity: 1120
Merit: 1147


View Profile
February 12, 2013, 07:46:25 PM
 #255

Have you considered that since the hashes are well distributed, you could mostly skip tree balancing.

You can't make that assumption because an attacker might create a bunch of transactions by brute force search that just happen to create an unbalanced tree. If you have a system where hash values determine the balance of the tree, you have no choice but to have some sort of way to measure the amount the tree is out of balance, which might be difficult if not all nodes know the full state of the tree, and prohibit the creation of unbalanced trees. You also need to be careful to ensure that if a tree is at the threshold just prior to being too out-of-balance to be legal there are legal operations that make the tree balanced again so honest miners can fix the problem. Finally fixing an unbalanced tree has to always be cheap.

TierNolan
Legendary
*
Offline Offline

Activity: 1232
Merit: 1077


View Profile
February 12, 2013, 08:00:05 PM
 #256

You can't make that assumption because an attacker might create a bunch of transactions by brute force search that just happen to create an unbalanced tree.

It would only affect the transactions in question.  Also, creating unbalancing transactions requires generating hashes with the same starts, which is hard to do.

Quote
Finally fixing an unbalanced tree has to always be cheap.

If you don't do balancing, then it doesn't require constant merkle tree rehashes.

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF
Peter Todd
Legendary
*
expert
Offline Offline

Activity: 1120
Merit: 1147


View Profile
February 12, 2013, 08:20:15 PM
Last edit: February 12, 2013, 08:36:29 PM by retep
 #257

You can't make that assumption because an attacker might create a bunch of transactions by brute force search that just happen to create an unbalanced tree.

It would only affect the transactions in question.  Also, creating unbalancing transactions requires generating hashes with the same starts, which is hard to do.

There is nothing else in existence that has put as much towards finding statistically unlikely hashes as Bitcoin has! Heck, the Bitcoin network has found hashes with IIRC 66 zero bits.

You just can-not assume that hashes calculated directly from data that an attacker has any control of will be uniformly distributed under any circumstance. Someone will get some GPUs together and write a program to find the right hashes to unbalance your tree just because they can.

TierNolan
Legendary
*
Offline Offline

Activity: 1232
Merit: 1077


View Profile
February 12, 2013, 08:47:18 PM
 #258

You just can-not assume that hashes calculated directly from data that an attacker has any control of will be uniformly distributed under any circumstance. Someone will get some GPUs together and write a program to find the right hashes to unbalance your tree just because they can.

The effect doesn't really matter though.  It will just slow down accesses for those particular transactions.

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF
Peter Todd
Legendary
*
expert
Offline Offline

Activity: 1120
Merit: 1147


View Profile
February 12, 2013, 09:05:39 PM
 #259

Actually, here is an idea that could work: rather than making the generated UTXO set be the state of the UTXO tree for the current block, make it the state of the UTXO tree for the previous block. Then for the purposes of putting the tx in the tree, essentially define the tx hash not as H(d) but as H(b | d) with b equal to the hash of the block where the tx was confirmed. This works because finding a block hash is difficult, and once you have found any valid block hash not using it represents a huge financial hit.

Of course, this doesn't actually work directly, because you usually don't know the block number when a txout was created. So we'll actually do something a bit different:

For every node in your radix tree, store the block number of the tree was last changed. The 0th node, the empty prefix, gets block 0. For the purpose of the radix tree, define the radix hash of the transaction, rh, as rh=H(hn | h) where hn is the hash of that block number, and h is the transaction's actual hash. Ignoring the fact that the transaction hash prefix changed, as can treat the rest of the bytes of the radix tx hash normally to determine which prefix edge is matched, and thus what node to follow down the tree.

Other than having to provide the block hash, proving that a transaction is in the UTXO set is not really any different than before. the proof is still a merkle path, and the security is still based on the infeasibility of reversing a hash function. I think it would be a good idea to add in some of the additional ideas I outlined in https://bitcointalk.org/index.php?topic=137933.msg1470730#msg1470730, but that is true of any UTXO idea.

If you really need to prove a tx was in the UTXO set as of the current best block, either wait for another block to be generated, or prove that it isn't in the UTXO set of the previous block, and prove that it is in the merkle tree of the current best block.

TierNolan
Legendary
*
Offline Offline

Activity: 1232
Merit: 1077


View Profile
February 12, 2013, 10:30:28 PM
 #260

Actually, here is an idea that could work: rather than making the generated UTXO set be the state of the UTXO tree for the current block, make it the state of the UTXO tree for the previous block. Then for the purposes of putting the tx in the tree, essentially define the tx hash not as H(d) but as H(b | d) with b equal to the hash of the block where the tx was confirmed. This works because finding a block hash is difficult, and once you have found any valid block hash not using it represents a huge financial hit.

This requires the tree to be completely regenerated for each block, since the hash of every transaction would change.

The attack on the system I suggested would require having lots of transactions that has the same initial bits in their hash.

If you had 1 million of them and added them to the tree, then it would become quite deep.  Each time you add one, the nodes would have to recalculate the entire path from root to that node and redo all the hashing.  The effort per node would be proportional to N squared, where N is the number of collisions.

A protection on that would be ignoring some transactions.

For example, if you have a binary tree, then the expected depth is log2(transactions).

With random hashes, it becomes very unlikely that any nodes would be much deeper than that.

A depth cutoff could be added.  If a branch is more than log2(transactions) * 4 deep, then it hashes to 0, so all nodes below it can be ignored.

This means that they cannot be verified as valid.

Quote
Of course, this doesn't actually work directly, because you usually don't know the block number when a txout was created. So we'll actually do something a bit different

You have to ask for proof anyway, the node could say which block the tx was added to the tree.

So, the hash used would be hash(hashBlock, hashTransaction), where block is the block where the tx was added to the chain.

All unspent txs are <block, tx> pairs anyway, since you need to scan back to the block to get the sig script to check for spending.

You won't know what the block hash is until you generate the transactions, so I think this protects it?

Quote
For every node in your radix tree, store the block number of the tree was last changed.

Ideally, the tree should just depend on what transactions are contained.

Quote
If you really need to prove a tx was in the UTXO set as of the current best block, either wait for another block to be generated, or prove that it isn't in the UTXO set of the previous block, and prove that it is in the merkle tree of the current best block.

You will probably have to do something like that anyway.  The plan was to merge mine, so some bitcoin blocks won't have matching tree root node hashes.  So, prove it wasn't spent up to 10 blocks previous and then check the last 10 blocks manually.

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF
Peter Todd
Legendary
*
expert
Offline Offline

Activity: 1120
Merit: 1147


View Profile
February 13, 2013, 01:11:04 AM
 #261

This requires the tree to be completely regenerated for each block, since the hash of every transaction would change.

Read the rest of the message - that's just a toy example to illustrate the general concept. My worked example does not require the whole tree to be recalculated nor does it require the block number for lookup.

The attack on the system I suggested would require having lots of transactions that has the same initial bits in their hash.

If you had 1 million of them and added them to the tree, then it would become quite deep.  Each time you add one, the nodes would have to recalculate the entire path from root to that node and redo all the hashing.  The effort per node would be proportional to N squared, where N is the number of collisions.

I think your misunderstanding stems from the idea that the datastructure and the authenticating hash are the same thing. They aren't, they're quite separate. The authenticating hash is an addition to the datastructure, and can be calculated independently.

Take your example of adding a million transactions, which is unrealistic anyway as a 1MiB block can't have more than a few thousand. In a radix tree what you would do is first add each transaction to the tree, keeping track of the set of all modified nodes. Then once you had finished the update you would recalculate the hashes for each changed node, deepest first, in a single O(n) operation.

A real Bitcoin node would basically have a limit on how much CPU time it was willing to use on adding transactions to the UTXO set, and collect incoming transactions into batches small enough to stay under that limit. Given we're only talking tens of updates/second chances are this optimization wouldn't even be required anyway.


Anyway, thinking about this a bit more, on reflection you are right and I think this unbalanced tree stuff is a total non-issue in any radix tree or similar structure where depth is a function of common-prefix length. For the 1.6million tx's in the UTXO set you would expect a tree 20 branches deep. On the other hand to find a transaction with n bits in common with another transaction requires n^2 work, so right there you aren't going to see depths more than 64 or so even with highly unrealistic amounts of effort thrown at the problem. (bitcoin has done a 66bit zero-prefix hash IIRC)

The issue with long depths is purely proof size, and 64*n isn't much worse than 20*n, so as you suggest, why make things complicated?

Too bad, it was a clever idea. Tongue


Quote
If you really need to prove a tx was in the UTXO set as of the current best block, either wait for another block to be generated, or prove that it isn't in the UTXO set of the previous block, and prove that it is in the merkle tree of the current best block.

You will probably have to do something like that anyway.  The plan was to merge mine, so some bitcoin blocks won't have matching tree root node hashes.  So, prove it wasn't spent up to 10 blocks previous and then check the last 10 blocks manually.

Yeah, until this is a hard and fast network rule you'll have to check that the hashing power devoted to these UTXO proofs is sufficient.

Speaking of... we should have every one of these UTXO things have a reference to the previous valid UTXO set. This would turn the whole shebang into a proper chain and thus allow clients to figure out both which is the longest chain, and how much hashing power is being devoted to it.

Equally if we think in terms of merge-mining a chain, adding support for additional UTXO indexes, such as known scriptPubKeys and so forth, is just a matter of adding additional chains to be merge mined, and UTXO-only and UTXO+scriptPubKey can co-exist just fine in this scenario. A disadvantage is we'll need to add some stuff to the P2P network, but I have another idea...

So right now, merge-mined chains have the problem that the proof of work goes through the coinbase transaction. The issue here is that to prove a path to the proof of work, you need the whole coinbase transaction, and it can be quite large, for instance in the case of P2Pool or Eligius. So I'm proposing a new standard, where one transaction of the following form is used to include an additional merge-mined digest, and that transaction will always contain exactly one txin, and one txout of value zero using the following:

scriptSig: <32-byte digest>
scriptPubKey:

Any digest matching this form will be assumed to represent the miner's hashing power, thus miners should not allow such transactions into their blocks blindly. They are currently non-standard, so this will not happen by default, and the scriptSig has no legitimate use. The scriptPubKey is spendable by the scriptSig, so for the txin miners would usually use the txout created by a previous block following this standard. If none are available the miner can insert an additional zero-fee transaction creating a suitable txout (of zero value!) in the block.

The digest would of course represent the tip of a merkle tree. Every merge mined digest in that tree will have a zero byte appended, and then the digests will be hashed together. What would go in the alt-chain is then the merkle path, that is every leaf digest required to get to the transaction, and what side the leafs were on. Note how this is different from the merge-mining standard currently used by namecoin, and fixes the issues it has with conflicting slot numbers.

Appending the zero byte is critical, because that action means to verify the an alt-chain block hash was legitimately merge-mined you simply check that every other digest in the path to the transaction has exactly 32 bytes, and that the transaction itself follows the above form. Note this also means the miner has the flexibility to use something other than a merkle tree to combine the alt-chain block hashes if they want; alt-chains should put reasonable limits of PoW size.

Alt-chains should also still support the same idea in the coinbase, for miners that don't plan on making large coinbase transactions. (the majority)

Now, the useful part, is we can easily add more than one of these transactions, and distinguish them by different sequence numbers in the txin. You would do that for the UTXO set, with one value of defined for just the UTXO set itself, another for a scriptPubKey index, and whatever else gets added as the system is improved. The previous block where this miner considered the merge-mined UTXO digest to be valid can be specified with nLockTime. The advantage of this idea is that because the final UTXO digests are completely deterministic we don't need to build a new P2P network to pass around the digests in the merkle path to the coinbase.

This also shows the advantage of using a separate transaction, because it keeps the UTXO proof size to a minimum, important for applications like fidelity bonded banks where UTXO proofs would become part of fraud proofs; we need to keep proof size to a minimum. Equally being able to prove the hashing power devoted to the UTXO set merge-mine chain is useful, and the only way to do that is provide a bunch of recent UTXO block headers and associated PoW's. Again, keeping the size down for this is desirable as the SPV node examining those headers may be on a low-bandwidth connection.

TierNolan
Legendary
*
Offline Offline

Activity: 1232
Merit: 1077


View Profile
February 13, 2013, 01:39:22 AM
 #262

Take your example of adding a million transactions, which is unrealistic anyway as a 1MiB block can't have more than a few thousand.

You could do them over multiple blocks.  However, the max depth of the tree is only 256 levels, so you can't actually kill the receiving node and getting a hash match would be almost impossible anyway.

Quote
In a radix tree what you would do is first add each transaction to the tree, keeping track of the set of all modified nodes. Then once you had finished the update you would recalculate the hashes for each changed node, deepest first, in a single O(n) operation.

O(NLog(N)) ?

Each node may cause an update to a full path to the bottom of the tree.

Quote
Anyway, thinking about this a bit more, on reflection you are right and I think this unbalanced tree stuff is a total non-issue in any radix tree or similar structure where depth is a function of common-prefix length. For the 1.6million tx's in the UTXO set you would expect a tree 20 branches deep. On the other hand to find a transaction with n bits in common with another transaction requires n^2 work, so right there you aren't going to see depths more than 64 or so even with highly unrealistic amounts of effort thrown at the problem. (bitcoin has done a 66bit zero-prefix hash IIRC)

It is 2^n work, not n^2, I assume a typo?

Quote
The issue with long depths is purely proof size, and 64*n isn't much worse than 20*n, so as you suggest, why make things complicated?

Right, and it only affects that transaction anyway.

Quote
Yeah, until this is a hard and fast network rule you'll have to check that the hashing power devoted to these UTXO proofs is sufficient.

I have to look up the details of merged mining, but I think you get the same protection as the main chain (at least for the blocks where it works)?

Quote
Speaking of... we should have every one of these UTXO things have a reference to the previous valid UTXO set.

Absolutely.  I assumed that was the plan.

Quote
Equally if we think in terms of merge-mining a chain, adding support for additional UTXO indexes, such as known scriptPubKeys and so forth, is just a matter of adding additional chains to be merge mined, and UTXO-only and UTXO+scriptPubKey can co-exist just fine in this scenario. A disadvantage is we'll need to add some stuff to the P2P network, but I have another idea...

Right, a general way to start new chains would be a good idea.  There would need to be some way to "register" a genesis hash and then that chain would only be used for 1 purpose. 

The key is keeping agreement on the rules for the parallel chains.

Quote
So right now, merge-mined chains have the problem that the proof of work goes through the coinbase transaction.

Yeah, need to look up the process.

Quote
The issue here is that to prove a path to the proof of work, you need the whole coinbase transaction, and it can be quite large, for instance in the case of P2Pool or Eligius. So I'm proposing a new standard, where one transaction of the following form is used to include an additional merge-mined digest, and that transaction will always contain exactly one txin, and one txout of value zero using the following:

scriptSig: <32-byte digest>
scriptPubKey:

Are transactions with 0 in 0 out allowed under the spec?

Quote
Any digest matching this form will be assumed to represent the miner's hashing power, thus miners should not allow such transactions into their blocks blindly. They are currently non-standard, so this will not happen by default, and the scriptSig has no legitimate use. The scriptPubKey is spendable by the scriptSig, so for the txin miners would usually use the txout created by a previous block following this standard. If none are available the miner can insert an additional zero-fee transaction creating a suitable txout (of zero value!) in the block.

So, this basically creates a chain that has been stamped over and over with the output of one tx being the input to the next one.

What happens if you have something like

Root -> none -> none -> Valid1 (from root) -> Valid2 (from valid1) -> Invalid1 (from valid2) -> stamped (from Invalid2) -> ....

There is no way to remove the stamped one.

Better would be including

<sub-chain-id> <previous valid root in sub-chain> <new valid root>

- Anyway need to read up on merged mining Smiley -

Quote
The digest would of course represent the tip of a merkle tree. Every merge mined digest in that tree will have a zero byte appended, and then the digests will be hashed together. What would go in the alt-chain is then the merkle path, that is every leaf digest required to get to the transaction, and what side the leafs were on. Note how this is different from the merge-mining standard currently used by namecoin, and fixes the issues it has with conflicting slot numbers.

Appending the zero byte is critical, because that action means to verify the an alt-chain block hash was legitimately merge-mined you simply check that every other digest in the path to the transaction has exactly 32 bytes, and that the transaction itself follows the above form. Note this also means the miner has the flexibility to use something other than a merkle tree to combine the alt-chain block hashes if they want; alt-chains should put reasonable limits of PoW size.

Alt-chains should also still support the same idea in the coinbase, for miners that don't plan on making large coinbase transactions. (the majority)

Now, the useful part, is we can easily add more than one of these transactions, and distinguish them by different sequence numbers in the txin. You would do that for the UTXO set, with one value of defined for just the UTXO set itself, another for a scriptPubKey index, and whatever else gets added as the system is improved. The previous block where this miner considered the merge-mined UTXO digest to be valid can be specified with nLockTime. The advantage of this idea is that because the final UTXO digests are completely deterministic we don't need to build a new P2P network to pass around the digests in the merkle path to the coinbase.

This also shows the advantage of using a separate transaction, because it keeps the UTXO proof size to a minimum, important for applications like fidelity bonded banks where UTXO proofs would become part of fraud proofs; we need to keep proof size to a minimum. Equally being able to prove the hashing power devoted to the UTXO set merge-mine chain is useful, and the only way to do that is provide a bunch of recent UTXO block headers and associated PoW's. Again, keeping the size down for this is desirable as the SPV node examining those headers may be on a low-bandwidth connection.

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF
Peter Todd
Legendary
*
expert
Offline Offline

Activity: 1120
Merit: 1147


View Profile
February 13, 2013, 05:00:19 AM
Last edit: February 13, 2013, 05:25:47 AM by retep
 #263

Quote
In a radix tree what you would do is first add each transaction to the tree, keeping track of the set of all modified nodes. Then once you had finished the update you would recalculate the hashes for each changed node, deepest first, in a single O(n) operation.

O(NLog(N)) ?

Each node may cause an update to a full path to the bottom of the tree.

No actually. Every operation in a radix tree takes O(k) time, where k is the maximum length of the key in the set. Since Bitcoin transactions have a fixed key length of 256 bits, that's O(1) time. Additionally since the number of intermediate nodes created for a transaction in the tree can't be more than k, a radix tree is O(k*n) space; again O(n) space for Bitcoin.

I mean, it's not so much that you are wrong, it's just that the log2(n) part is bounded by a fixed small number so it really is appropriate to just say O(n), and as I explained, updating a batch of n transactions, especially given that n << N (where N is the total size of the txout set) is an efficient operation. Note that the n in O(n) is the number of new transactions, not the number of existing ones.

It is 2^n work, not n^2, I assume a typo?

Oops, good catch.

Quote
Yeah, until this is a hard and fast network rule you'll have to check that the hashing power devoted to these UTXO proofs is sufficient.

I have to look up the details of merged mining, but I think you get the same protection as the main chain (at least for the blocks where it works)?

For determining if a block in an alt-chain can be reversed you could be correct under some rules, but in this case each valid PoW is really more like a vote that the UTXO set mined by that PoW is valid. Thus the protection is only the hashing power that mined the blocks containing the PoW's.

Quote
Speaking of... we should have every one of these UTXO things have a reference to the previous valid UTXO set.

Absolutely.  I assumed that was the plan.

You'd hope so, but I'll have to admit I somehow didn't realize that until today.

Right, a general way to start new chains would be a good idea.  There would need to be some way to "register" a genesis hash and then that chain would only be used for 1 purpose.

The key is keeping agreement on the rules for the parallel chains.

It's tricky though, because the PoW can mean totally different things for different types of chains. Not to mention how for any application but timestamping you need to write a whole set of chain rules too.

That said, following a single merge-mining standard is a good thing; I only proposed this new one because as far as I know namecoin is the only one using the existing multi-chain standard, and that standard sucks. However:




Quote
scriptSig: <32-byte digest>
scriptPubKey:

Are transactions with 0 in 0 out allowed under the spec?

They sure are! The only thing a tx needs is one or more txin's and one or more txouts. Both scriptSigs and scriptPubKeys are allowed to be empty, and the value can be empty. (although you can't spend an empty scriptSig with an empty scriptPubKey; something needs to push a true value to the stack)

So, this basically creates a chain that has been stamped over and over with the output of one tx being the input to the next one.

What happens if you have something like

Root -> none -> none -> Valid1 (from root) -> Valid2 (from valid1) -> Invalid1 (from valid2) -> stamped (from Invalid2) -> ....

Not quite. The txin is just there because a Bitcoin transaction is only valid if it has a txin; what is actually in the txin is totally irrelevant. The link to the previous "considered good" block has to be within the UTXO header and that nLockTime would best be used only as an auxillary bit of data to allow nodes to reproduce the UTXO chain block header after they deterministicly compute what the state of the UTXO tree would have been with that blocks transactions included. It's just a way of avoiding the pain of implementing the P2P network that really should be holding that data, and getting something working sooner. It's a solution that uniquely applies to a UTXO merge-mined alt-chain; no other type of chain would allow a trick like that.

etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
February 13, 2013, 11:33:42 AM
 #264

Not much time to respond now, but I wanted to point out the the PATRICIA tree concept has no balancing issues -- if it looks like it does, it's strictly an illusion.  The reason is this:

Take a standard, non-level-compressed trie, as shown in my previous post.  There is no question that this is a O(1) data structure:  if you use 256-way trie, it always takes exactly 20 hops to get from root to the leaf of a 20-byte address string.   It has O(1) query, O(1) insertion, O(1) deletion.

All operations on a PATRICIA tree are strictly less than the equivalent implementation of a trie.  Basically, the upper bound of computation time for any PATRICIA tree operation is that of the non-level-compressed trie.  It's just that the number of  hops that you shortcut as a benefit of level-compression, is variable depending on the data in the tree/trie.  The amount of operations you get to skip can be altered by an "attacker", but the worst they can do to you is require the performance of a trie -- which is O(1).

Re: skip strings:  there's no way you can use a basic trie for this -- the space overhead of all the intermediate (and unnecessary) nodes would overwhelm the data that is being stored.  In fact, even with level compression and linked-list nodes, I'm still very concerned about the space overhead -- I suspect we may be storing something like 3x the size of the underlying UTXO set just to store this DB.  I just pulled that number out of nowhere (3x), because I haven't rigorously explored it for various branching factors ... but it was one of the downsides of the trie-based structures compared to the BSTs.


Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
TierNolan
Legendary
*
Offline Offline

Activity: 1232
Merit: 1077


View Profile
February 13, 2013, 02:39:06 PM
 #265

Not much time to respond now, but I wanted to point out the the PATRICIA tree concept has no balancing issues -- if it looks like it does, it's strictly an illusion.

Right.  Also, since the keys are crypt hashes, it is even hard to push the tree much deeper than the expected depth, even if you want to.

Quote
Re: skip strings:  there's no way you can use a basic trie for this -- the space overhead of all the intermediate (and unnecessary) nodes would overwhelm the data that is being stored.

The skip strings are over complex, since everything is "random".

I was thinking of having a "leaf" type node.  The skip system is just over complex.

A parent node has a pointers to its children.

A leaf node has the 256 bit hash.

If you have a 256 pointer array for every node, then you have to many of them.

The bottom level of parent nodes will likely have a small number of children.  If you implement it as a 256 pointer array, then for each leaf, you are adding 128 unneeded pointers (128 to each of the 2).  On a 32-bit machine, that is 128 * 4 = 512 bytes.

You data is a 32 byte hash, so the overhead is massive.

However, that assumes that the 2nd to bottom nodes only have 2 sub-nodes.  Not sure what the actual odds are.

An more efficient structure would be.

Code:
TreeNode {
    private final int level; // the depth of this node
    private final byte key; // the key byte = hash[level]

    public boolean matchesKey(byte[] hash) {
        return key[level] == this.key;
    }
}

ParentNode extends TreeNode {

    private TreeNode[] children;  // power of 2 array of children

    public TreeNode getMatchingChild(byte[] hash) {
        TreeNode child = children[hash[level + 1] % children.length];
        if (child == null) {
            return null;
        }
        return child.matchesKey(hash) ? child : null;
    }

}

LeafNode extends TreeNode {

    byte[] data = new byte[32];

}

To add a node you use root.getMatchingChild() until null is returned.  Then you have to add the hash to the last non-null Node.

Having said that, that is the data structure.

The merkle tree structure would also need to be specified.

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
February 13, 2013, 03:08:57 PM
 #266

The benefit of using PATRICIA/Brandais hybrid is that it's a "standard" datastructure.  And it's one that seems to fit this use case pretty perfectly.  I just followed you pseudocode and I think you are basically proposing the same thing as the PATRICIA tree, but using implicit skip strings, instead of explicit ones.  It can be reasonably argued that the skip strings are not necessary to authenticate the structure, but I think you are introducing extra complexity into the algorithms for inserting and deleting nodes, in exchange for simplifying authentication.  It's very difficult to describe without a well-thought-out example, but you can probably see it yourself if you try to work out the logic for worst-case-logic-complexity of an insert or delete operation.  It involves uncompressing, inserting/removing nodes, recompressing on both sides, and then rearranging a ton of pointers.  If the skip strings are not attached to the nodes, you have to recompute it from the children which involves a lot of string compares, and probably a bit slower.  I had to program the insert function for a patricia tree in as a homework assignment in college, once.  It was a mess... and kinda fun Smiley

As for the comment about node overhead:  I couldnt' follow your math (looks like you were missing 256-way trienodes with binary trie nodes), but I think something you're overlooking is the linked-lists for the pointers at each node (assuming some power of 2 branching factor more than 2^1).  This means that nodes with only one child, only store one pointer.   It is wasteful for the top levels where you have linked-list overhead for super-dense nodes, but the majority of data is near the leaves where the optimization saves you a ton of space.  Plus, because the top nodes are constantly changing, you can optimize them in code pretty easily to be more pleasant for both query-time and space-consumption.

I need to think a little harder about the benefits of using a binary-PATRICIA tree... it pretty much removes the necessity for the Brandais aspect of it (compressing node children into linked lists), and certainly adds a bit of compute-efficiency to it at the expense and taking more space (I think the binary trie will be faster at updating the authentication data, but will have more intermediate/branch nodes to be stored).

Unfortunately, I have too many super-high priority things on Armory in the next month or two, so there's no way I can get around to playing with this, yet.  However, I may eventually be doing some kind of electrum-style version Armory, in which I will split Armory both ways -- into an Armory supernode and Armory litenode.  The supernode will do something like we're discussing here, and the lite nodes will depend on having access to a trusted supernode.  It sounds like a good time to prototype this idea...

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
TierNolan
Legendary
*
Offline Offline

Activity: 1232
Merit: 1077


View Profile
February 13, 2013, 03:30:31 PM
 #267

As for the comment about node overhead:  I couldnt' follow your math (looks like you were missing 256-way trienodes with binary trie nodes),

I was just looking at the diagram Smiley.  It shows that each node has an full sized array.  With a binary tree, that is just 2 pointers so almost no cost.

For a 256-way tree, this means that the 2nd to lowest nodes (which would be sparse) would be 256 element arrays and only have a small number of children.  So, the cost per child is quite high.

With the way I suggested it, you get the benefits of a 256 way tree, but still have small nodes.  That is the different between 256 pointer lookups vs 32 pointer lookups.

The size of a binary tree is approx double the number of children, which isn't that much larger than a 256 way one.

Under a HashMap scheme, the toplevel nodes would have more than 128 children, so would all end up as a flat array. 

Quote
I need to think a little harder about the benefits of using a binary-PATRICIA tree... it pretty much removes the necessity for the Brandais aspect of it (compressing node children into linked lists),

As I said, hashmaps should be used instead of the linked lists.

Quote
Unfortunately, I have too many super-high priority things on Armory in the next month or two

Yeah, days need more than 24 hours.

I think the easiest is to define the merkle tree in a way that is easiest to visualize and leave implementation details until later.  Ofc, that assumes the merkle structure doesn't cause implementation problems.

If you defined the merkle hash of a node with only child as equal to that child's hash, then you can just define the tree as a full depth tree and leave pruning to the implementation.

Hash(leaf-node) = leaf's key

Hash(parent with one child) = Hash(child)

Hash(parent with 2 children) = Merkle(Hash(child1), Hash(child2))

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF
TierNolan
Legendary
*
Offline Offline

Activity: 1232
Merit: 1077


View Profile
February 14, 2013, 10:33:03 AM
 #268

I was thinking about proving that the alt chain is correct for the light weight clients.

There could be an additional "falsification" link where you show that a link was not valid.

A falsification node basically jumps the chain back to before that node.

A -> B -> C -> D -> False(C)

Chain becomes

A -> B -> C -> D -> False(C) -> C* -> D*

The false node wouldn't require any difficulty.  Since C and D both met the difficulty requirements, this doesn't actually cause spam.  Also, the false node should ideally propagate as fast as possible and not be mined in.

The false node could be of the format

Hash of last node
Hash of false node
Hash of proof

The proof could be an "attachment" but isn't used by C* when forming the "parent node" hash.  That way light clients can download the headers only.

Light clients would just make sure that the alt chain forms a chain.  False links don't have to be merge mined, since they can be directly proven.  The link would incorporate all the data required to

Light nodes could do random checking as they download the chain.  If each checker checks 0.1% of the updates, then with 1000 users, most historical errors would be detected.  Nodes could also check 1% of all new updates.

Also, light nodes could check new links at random for the same reason.

The "expanded" data for the node would include all txs added and removed for that node.  A light node could then check that this causes the update as claimed.

I think the tree should include the block in which the TXO was created as part of its "hash".

The "signature" of an UTXO would be {hash(tx),var_int(block number}, var_int(transaction number), var_int(output number)}.  This would add 5 extra bytes to the 32 byte hash.

This allows the light node to query the main bitcoin chain to check the transaction inputs.

I wonder if an all light mode network would be possible under this system.  Each node would only have to store some of the bitcoin blockchain data, as long as the total storage capacity was large enough to store it (with a large margin).

Random checking would pick up any double spend attempts very quickly.

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
February 14, 2013, 12:38:44 PM
 #269

If it looks like the sparse, lower-level nodes have 256 pointers, you're looking at the wrong picture (or I didn't explain it well enough).  The Brandais part is where the list of pointers is compressed to a linked list and thus there is only as many pointers as there are nodes (well, a little more, for the linked-list forward-pointers).  This does increase search time to have to forward-traverse the linked list at each node, but they will be ordered, which means that it can be further optimized, and those kinds of lookups should be fast (in fact, they can be easily stored as simple lists, replacing the list on every update, since it's probably just as fast to do that with disk-based DB accesses).  The important part is that if a parent has 5 children, those 5 children are in lexicographical order, and only their 5 "fingerprints" will be used in the computation of the parent's "fingerprint."

I don't think a hashmap works for this.  I guess it depends on the kind of hashmap you're talking about -- but if it's a "simple" one where there are lots of collisions, you end up with some non-determinism based on insert order, and removing elements is complicated.  But maybe I don't totally understand what you're proposing.

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
TierNolan
Legendary
*
Offline Offline

Activity: 1232
Merit: 1077


View Profile
February 14, 2013, 01:41:06 PM
 #270

If it looks like the sparse, lower-level nodes have 256 pointers, you're looking at the wrong picture (or I didn't explain it well enough).  The Brandais part is where the list of pointers is compressed to a linked list and thus there is only as many pointers as there are nodes (well, a little more, for the linked-list forward-pointers).

I think we are discussing different trees.

I was effectively suggesting an improvement on the brandais modification, use a hashmap instead of a linked list.

Basically, have a power of 2 array of pointers.

Then you can see if there is a match with

Code:
Child c = arr[keyByte & (length-1)]

if (c.getKeyByte() == keyByte) {
  // child matches
}

If not, then it is a hash collision and you need to double the size of the array and re-hash.

The array could be reduced in size on removal, say when it drops below 25%.

Quote
I don't think a hashmap works for this.  I guess it depends on the kind of hashmap you're talking about -- but if it's a "simple" one where there are lots of collisions, you end up with some non-determinism based on insert order, and removing elements is complicated.  But maybe I don't totally understand what you're proposing.

The only non-determinism is that some insertions will be instant (no collision) and some will require re-hashing.  This could slow things down.

If that is an issue you could spread the re-hash out over many insertion and deletions.  So, when you insert a key, it might does 2 re-hash steps per level, but is bounded.

Also, removal would need a rule for when to shrink the array.

Maybe, increase when a collision occurs and decrease if < 25% full.  However, that could oscillate.

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF
TierNolan
Legendary
*
Offline Offline

Activity: 1232
Merit: 1077


View Profile
February 15, 2013, 10:04:18 PM
 #271

Are transactions with 0 in 0 out allowed under the spec?

They sure are! The only thing a tx needs is one or more txin's and one or more txouts. Both scriptSigs and scriptPubKeys are allowed to be empty, and the value can be empty. (although you can't spend an empty scriptSig with an empty scriptPubKey; something needs to push a true value to the stack)

So, either use a previous "open" one of these transactions, or create a new one.  Are transactions handled sequentially in a block.

Can you spend an output of transaction 1 in transaction 7?  If so, then the miner could add a zero transaction as one of the coinbase outputs.

Also, if the number of normal transactions was a power of 2, then the proof gets much less complex.

Assuming 4 "real" transaction and the 5th one as M.

Under the Merkle rules, you expand the number of transactions to a power of 2.

A B C D M M M M

A - D are normal

M contains the merge minded signature.

To prove M is in the block, you just need to provide Merkle-Hash(A, B, C, D) and M.

The entire RHS of the tree is obtained by hashing M once for each level in the tree with the merkle rule.  This can be performed pretty fast.

I think the special transaction should have a magic number.  This would make it slightly larger, but if you added a 32 byte random "official" magic number, then it is very unlikely that it would happen accidentally.

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF
TierNolan
Legendary
*
Offline Offline

Activity: 1232
Merit: 1077


View Profile
February 16, 2013, 03:24:53 AM
 #272

As an aside, roughly, what is the total number of unspent transaction outputs at the moment?

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF
solex
Legendary
*
Offline Offline

Activity: 1078
Merit: 1002


100 satoshis -> ISO code


View Profile
March 08, 2013, 05:38:36 AM
Last edit: March 09, 2013, 09:21:59 AM by solex
 #273

As an aside, roughly, what is the total number of unspent transaction outputs at the moment?

It would also be interesting to know what percentage are <COIN_DUST.

gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 4158
Merit: 8343



View Profile WWW
March 08, 2013, 06:46:50 AM
 #274

So some discussions from the inflationproofing thread provided two additional requirements for the UTXO tree:

It needs to be a sum-tree over txout value so that the utxo root also shows the currency hasn't been inflated and allows stochastic utxo checks. Thats easy enough, just an implementation detail.

The other thing is that we need some way of testing that randomly selected transactions in a block are spending an input that was in the uxto set (not spending coins from thin air), thats easy— except when they're spending txouts in the same block. Is there a way to accommodate that without requiring a separate by-output lookup against transactions in the blocks as a p2p message?
apetersson
Hero Member
*****
Offline Offline

Activity: 668
Merit: 501



View Profile
March 08, 2013, 07:55:46 AM
 #275

The other thing is that we need some way of testing that randomly selected transactions in a block are spending an input that was in the uxto set (not spending coins from thin air), thats easy— except when they're spending txouts in the same block. Is there a way to accommodate that without requiring a separate by-output lookup against transactions in the blocks as a p2p message?

i would think there is an obvious, straightforward way.
lets say a node requests matching transactions from a different node who has already recieved the latest block, using bloom filtering.
if it has dependencies outside of the the UTXO, all parent transactions are also automatically provided until there are all included, even if they do not match the bloom filter.
if he would wait until the next block, the case becomes "trivial" again.
TierNolan
Legendary
*
Offline Offline

Activity: 1232
Merit: 1077


View Profile
April 14, 2013, 09:49:06 AM
Last edit: April 14, 2013, 12:23:31 PM by TierNolan
 #276

It needs to be a sum-tree over txout value so that the utxo root also shows the currency hasn't been inflated and allows stochastic utxo checks. Thats easy enough, just an implementation detail.

So, the hash becomes 36 bytes instead of 32.  The extra 4 bytes is the total in satoshis of UTXO.

For a node with 1 child
hash(node) = hash(child)

For a node with 2 children
hash().hash -> the 32 byte hash (sha256 squared)
hash().value -> total value (unsigned int  (max 21 million))
hash() -> hash().hash : hash().value

hash(node).hash = sha256(sha256(hash(child1) : hash(child2)))
hash(node).value = hash(child1).value + hash(child2).value

Quote
The other thing is that we need some way of testing that randomly selected transactions in a block are spending an input that was in the uxto set (not spending coins from thin air), thats easy— except when they're spending txouts in the same block. Is there a way to accommodate that without requiring a separate by-output lookup against transactions in the blocks as a p2p message?

That actually seems to be a sub-problem of the main problem that we are trying to solve, but at the sub-block level.  You can prove the TX out was created by transaction 10 in the current block, but if it is used in transaction 100, how do you know if it was spent between transactions 11 and 99?

I think a merkle tree for all the intermediate states of the block would be required.  This actually might not be that big a deal.  The miner would already have to add and remove all the UTXOs from/to the the tree anyway.  This would require adding them to an array and then computing the merkle root anyway.

The block header (or alt chain header) would have 2 new fields
Root of UTXO tree
Merkle Root of roots of the UTXO tree for all transactions in this block

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF
Stampbit
Full Member
***
Offline Offline

Activity: 182
Merit: 100



View Profile
April 14, 2013, 05:42:47 PM
 #277

This certainly sounds like a very well engineered solution, any idea if this will ever be implemented?
TierNolan
Legendary
*
Offline Offline

Activity: 1232
Merit: 1077


View Profile
April 20, 2013, 06:24:48 PM
 #278

I was thinking about this as a pure system that would allow dropping the blockchain storage (except for recent blocks) completely.  Effectively, the data associated with coins would be moved to the owners of the coins.

You need to store the merkle path down to the transaction that created the coin.  This won't change with time.

However, to spend it you also need live path from the latest root.

This tree could be stored in a distributed fashion.  It only needs to be stored for a while, but the longer the better.

Your client could try to keep track of the tree down to your transactions.  The amount of data depends on the number of transaction outputs.  If there was 1 trillion outputs, then you would need to store around 40 levels until yours was the only transaction left on the path.

This works out at 40 * 32 bytes = 1280 bytes for per block per coin.  This data rate could be decreased by combining all your coins into one and/or reducing the time between asking for the updated data.

The longer the network stores update data, the less often nodes need to update.

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
April 20, 2013, 06:33:54 PM
Last edit: April 20, 2013, 06:56:07 PM by etotheipi
 #279

I was thinking about this as a pure system that would allow dropping the blockchain storage (except for recent blocks) completely.  Effectively, the data associated with coins would be moved to the owners of the coins.

You need to store the merkle path down to the transaction that created the coin.  This won't change with time.

However, to spend it you also need live path from the latest root.

This tree could be stored in a distributed fashion.  It only needs to be stored for a while, but the longer the better.

Your client could try to keep track of the tree down to your transactions.  The amount of data depends on the number of transaction outputs.  If there was 1 trillion outputs, then you would need to store around 40 levels until yours was the only transaction left on the path.

This works out at 40 * 32 bytes = 1280 bytes for per block per coin.  This data rate could be decreased by combining all your coins into one and/or reducing the time between asking for the updated data.

The longer the network stores update data, the less often nodes need to update.

That's a super interesting idea.  I'll have to sleep on that one.

Right now, everyone tracks everyone's coins, and you are responsible for maintaining your own private keys.  In your system, you are maintaining your own private keys and the subtree information about the coins contained within.   Nodes don't have to keep that information, because they only need it when the coins are being spent, and you are the only who ever spends the coins.  

Sure part of the UTXO set could be "lost" if your computer crashes, but those coins are lost, too, so it doesn't really matter...?  If the coins will never be spent, the sub-tree never changes, and so nodes don't care what's inside that subtree, as they have its fingerprint.   I have not had time to think through the implications (or complications) of it, but it's a super-interesting thought-experiment.  


P.S. -- This is yet another area where the trie-based structures win -- since there is branch independence in tries, peers don't have to store what's below a certain node as long what's below it is not changing (they only need the fingerprint of that node after the last update to it).  If you use a BST, this doesn't work, because updates in neighboring branches can cause rebalances, forcing you to know what's in this branch so that nodes can be shuffled between branches.


Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
TierNolan
Legendary
*
Offline Offline

Activity: 1232
Merit: 1077


View Profile
April 20, 2013, 06:40:30 PM
 #280

This is yet another area where the trie-based structures win -- since there is branch independence in tries, peers don't have to store what's below a certain node as long what's below it is not changing (they only need the fingerprint of that node after the last update to it).  

Right.  Just use the transaction hash directly as key and accept that there might be imbalances.  However, the imbalances are not really going to happen because you are using a hash (effectively random) value as key.  So the law of large numbers does tree balancing for you.

The way I would see it is that the owner stores the historical path and the network stores the live state of the tree.  However, when your client connects it boosts the local part of the tree near its coins.  Telling nodes about the local area helps secure your own coins.

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF
hazek
Legendary
*
Offline Offline

Activity: 1078
Merit: 1002


View Profile
April 20, 2013, 06:44:27 PM
 #281

And wouldn't this also remove tractability since now other nodes only have a fingerprint of the transaction(s) with which you received your coins and not the entire history anymore? I like this idea a lot on the surface.

My personality type: INTJ - please forgive my weaknesses (Not naturally in tune with others feelings; may be insensitive at times, tend to respond to conflict with logic and reason, tend to believe I'm always right)

If however you enjoyed my post: 15j781DjuJeVsZgYbDVt2NZsGrWKRWFHpp
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
April 20, 2013, 06:48:45 PM
 #282

Right.  Just use the transaction hash directly as key and accept that there might be imbalances.  However, the imbalances are not really going to happen because you are using a hash (effectively random) value as key.  So the law of large numbers does tree balancing for you.

Just to clarify:  tries/patricia trees/de la brandais trees do not have balancing issues.  They are all tightly bounded to a maximum number of operations to do queries, inserts, deletes.  It's just that the optimizations of PATRICIA/Brandais bring you far below that constant upperbound.  Thus, "unbalancing" simply removes optimization, but you're still operating well within the confines of constant time, no matter what the tree structure looks like.  

The distinction only matters if there was reason to believe that those optimizations are necessary to make this idea feasible.  I do not believe that is the case, here.   In terms of access times, I believe even a regular-old trie (forcing full traversal of each path) would still work. 


Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
TierNolan
Legendary
*
Offline Offline

Activity: 1232
Merit: 1077


View Profile
April 21, 2013, 12:45:46 AM
 #283

Another feature (or disadvantage) is that it allows dropping of extra info added into the blockchain.

For the system to work, all you need is lots of sha(sha(value)) to value mappings.  The values are always 2X36 bytes.  Values are always "hash(child1);coins(child1);hash(child2);coins(child1)".

This means that there is no bloat.  It is up to the coin owner to keep the full transaction data and they only submit it for spending.

You can still timestamp documents, but not add data to the blockchain as a permanent record.

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
April 22, 2013, 04:58:00 PM
 #284

gmaxwell pointed out the obvious flaw in this proposal:  you can supply the input branches to prove that the TxOuts that you are spending exist, but you can't supply the destination branches.  Otherwise, full nodes have no idea how to update the sub-branches of the target address.  Even if they know that this is the first UTXO for that address, there may be lots of other branches on the way down to that node which are unknown to it.

There's not a way around this, other than just having full nodes store the entire trees.  Which means we're back to square one Sad

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
TierNolan
Legendary
*
Offline Offline

Activity: 1232
Merit: 1077


View Profile
April 22, 2013, 05:26:47 PM
 #285

There's not a way around this, other than just having full nodes store the entire trees.  Which means we're back to square one Sad

That info has to be stored.  I see it that the live tree (and maybe 50-100 blocks of history) needs to be stored.

However, you could do it in a distributed fashion.  Every node in the tree has to be stored somewhere.

The spender could provide the old path and a new path that was correct within the last 50 steps.  The top of the tree, which would be change every block would be live for all full nodes anyway.

You only have to look at transactions that start with the same prefix as yours to see if the hash to the root changes.

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
May 02, 2013, 09:48:23 PM
 #286

I had a little revelation last night, while thinking about this proposal.  In hindsight, it seems so simple.  But hindsight is always 20/20, right?  My thought process was:  I've implemented RAM-based PATRICIA trees before, but what's a good way to do this on disk?  For instance, I want to implement this in LevelDB, so I need some way to make LevelDB behave like a memory space.

One of the other issues with the PATRICIA/hybrid approach is the that there's a lot of data needed to store pointer lists, etc.  It does have quite a bit of overhead.  And you don't want to optimize it in such a way that limits the generic-ness of the structure. I'd prefer to maintain the textbook-generic-ness of this data-structure, and let implementations do their own optimizations as long as they can convert and reproduce the same calculations.  

The revelation was that you don't need to replicate a memory space with abstract pointers to each trie-node and leaf.  You can store them based on their node-prefix value, and the DB will auto-sort the values in depth-first-search order.  For instance, let's take this structure:



All you need to is store everything by its prefix.  Here's what the DB entries would look like:

Quote
Key -> Value
""     -> RootHash, SumValue, 3, "1", "3", "6"
"1"    -> NodeHash, SumValue, 2, "1", "3"
"11"   -> NodeHash, SumValue, 2, "2", "3"
"1122" -> LeafHash, Value
"1137" -> LeafHash, Value
"1342" -> LeafHash, Value
"3333" -> LeafHash, Value
"678"  -> NodeHash, SumValue, 3, "0", "5", "9"
"6780" -> LeafHash, Value
"6785" -> LeafHash, Value
"6789" -> LeafHash, Value

Each "numChildren" value (after the SumValue) can be exactly one byte, because you never have more than 256 ptrs, and each child pointer is also exactly 1 byte.  If you want to jump to a particular child, for instance, you are at node "11" and want to go the child at 3, you simply do iter->Seek("11"+"3") and it will skip "1122" and put the iterator right at "1137", which is the first database value >= "113".


Furthermore, you might be able to get away without even any pointers!  You might just store the node/leaf hash and value, and know about children after the fact, simply by continuing your iteration.  You are at IterA, and IterB=IterA.Next().   You know that IterB is a child node of IterA because IterB.key().startswith(IterA.key()).   That's stupid simple.  

So, you know what level you're at simply by looking at Iter.size()
So, you know that you are a child because IterNext.key().startswith(IterPrev.key()).
If the previous check fails, you know you finished traversing that branch and you can update IterPrev.

Though, there may be something I'm missing that would still require you to store the pointers.  But it's still a lot better than storing 6-8 bytes per pointer, which was originally where I thought the bulk of the data was originally going to end up.

Even better, you don't really have to implement the minutiae of the PATRICIA tree, because it's kind of done automatically by the nature of a key-sorted database.  The database inserts everything in the correct place for you, and it just so happens that tries and PATRICIA trees get iterated the same way, without having to store structure information.  On the contrary, a depth-first search on a BST will also be sorted this way but you have to store data at each node about the local structure of the tree, and update all the nearby nodes if there's a rebalance.  Since the PATRICIA tree has a deterministic structure based solely on the inclusive set, you can insert and remove nodes without any extra seek/updates, and natural iteration over the dataset will result in the right answer as if you implemented a full PATRICIA tree.

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
ThomasV
Legendary
*
Offline Offline

Activity: 1896
Merit: 1343



View Profile WWW
May 03, 2013, 09:50:20 AM
 #287

subscribing

Electrum: the convenience of a web wallet, without the risks
TierNolan
Legendary
*
Offline Offline

Activity: 1232
Merit: 1077


View Profile
May 05, 2013, 11:48:09 AM
 #288

I had a little revelation last night, while thinking about this proposal.  In hindsight, it seems so simple.  But hindsight is always 20/20, right?  My thought process was:  I've implemented RAM-based PATRICIA trees before, but what's a good way to do this on disk?  For instance, I want to implement this in LevelDB, so I need some way to make LevelDB behave like a memory space.

Assuming there are 4 billion UTXOs, that means that the tree will be dense for the first 32 bits on average.  All leaf nodes will have 256 - 32 = 224 bits of data each.

If you just store all the transaction hashes in the tree in full, then you need 32 bytes per entry, instead of 28 bytes, so you aren't really saving much.

Having a fixed 32 bytes per entry would mean that the file has fixed width entries, which would make seeking easier.

The only exception are outputs for the same transactions.  Each leaf could have a list of outputs and how much coin in each.  This breaks the fixed field length though.

The UTXO-id would be {tx-hash, out-index, value}.

You effectively save

{tx-hash, total-value}, {out-index, value}, .... {out-index, value}, {end-delimiter}

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF
hazek
Legendary
*
Offline Offline

Activity: 1078
Merit: 1002


View Profile
May 05, 2013, 01:48:48 PM
 #289

For me, this is the most exciting thread on this forum.

My personality type: INTJ - please forgive my weaknesses (Not naturally in tune with others feelings; may be insensitive at times, tend to respond to conflict with logic and reason, tend to believe I'm always right)

If however you enjoyed my post: 15j781DjuJeVsZgYbDVt2NZsGrWKRWFHpp
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
May 05, 2013, 10:45:12 PM
 #290

For me, this is the most exciting thread on this forum.

Smiley I've actually received some pressure to start implementing this myself, with some urgency.  I have resisted solely because I'm totally swamped with other things, and expect I'll get into it in about 6 months.  And I felt guilty about that, but I have some personal/selfish reasons for that.

But now I don't feel so bad.  It seems like, once every month, I have some revelation about how this could be improved, or solving some aspect of it that I wasn't sure how to solve earlier.  Now I am comfortable with the downloading from unsynchronized peers and/or having multiple blocks generated while downloading that data, and I feel like I have a really good way to encode this with high-space efficiency.  This is making it all the easier for me to imagine implementing this, when I finally have time.  Or maybe someone else will.



Talking about the non-sync'd downloading (link in the previous paragraph), I just wanted to add a comment:  I noticed that LevelDB has read-snapshots, and it looks like other DB engines do, too.  (Do most of them?).  It certainly would simplify this even further. For instance, consider that I ask a node to send me some branch of the tree.  Two new blocks come in since the download started, that cause that peer to update the tree while it is in the process of sending it to me.  In a completely naive system, I would end up with internally inconsistent data, and no good way to get it without getting lucky to have no new blocks while downloading.

However, if you are using a read snapshot, you can essentially freeze the DB state in time so that you can read it's contents without worrying about any updates since it was frozen.  You just throw away the snapshot when you're done.  I assume it does this efficiently, by essentially storing the difference data between when you took the snapshot, and rewind those differences when you retrieve data from the tree.  This makes everything even more feasible.

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
Come-from-Beyond
Legendary
*
Offline Offline

Activity: 2142
Merit: 1009

Newbie


View Profile
May 10, 2013, 07:03:23 AM
 #291

Could anyone let us know the current progress in implementation of this idea?
TierNolan
Legendary
*
Offline Offline

Activity: 1232
Merit: 1077


View Profile
May 10, 2013, 09:22:51 AM
 #292

Could anyone let us know the current progress in implementation of this idea?

I was thinking of looking into it "soon", but I have lots of other stuff going on.  My though are that it should be a distributed verification system and distributed hash table. 

The official client is going down the path of not allowing random transaction lookup, so the DHT is needed to support that.

Each node would randomly select transactions to verify.  You might set your node to verify only 1% of all transactions (p = 0.01).  When you get a new block with N transactions, you would attempt to verify only p * N of them (though it would be random, so you might verify more or less than that).

My thoughts are that all nodes would verify all branches that they are aware of.  Orphans within say 10k of the end of the chain would be verified.

It just marks blocks as valid and invalid.

The distributed hash table needs to store all transactions and also all internal nodes in all trees that are in use.  It is hash -> children nodes.

When you connect to a node, you tell it what you think is the end of the main chain.  You also give the last 10 blocks and nodes along the way.

For each location, you give

- hash of main chain header
- hash of UTXO root root

You would also be monitoring the main chain so you can find the chain with the longest POW.

You can then try find the fork points (since the power of 2 increase is relative to the start, all nodes would give the same values).  POW disagreements can be fixed by proving to one of the nodes that they aren't on the longest branch.

This should leave all nodes either agreeing or disagreeing based purely on validation.  You ask both nodes to prove the other node's block is invalid.  If a node won't switch to the longest POW branch, then you ask it to prove why.

This means that all nodes should keep a record of valid block headers (i.e. ones that meet POW) and the proof that they are actually invalid blocks.  This shouldn't happen that often, since creating an valid block header for invalid blocks is expensive.

This means that it doesn't even need to be an alt chain.  It is just a system where proof about invalid blocks is stored and shared.

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF
Evan
Hero Member
*****
Offline Offline

Activity: 507
Merit: 500



View Profile
May 10, 2013, 07:51:23 PM
 #293

This idea has been scattered throughout some other threads, but there is no one place that fully explains the idea with pictures.  I believe this relieves two major problems with the network at once -- compression/pruning, and lightweight node security -- and does so in a non-disruptive way.  I am not positive that this is the right way to go, but it definitely warrants discussion.



Summary:  [SEE ILLUSTRATIONS BELOW]

Use a special tree data structure to organize all unspent-TxOuts on the network, and use the root of this tree to communicate its "signature" between nodes.  The leaves of this tree actually correspond to addresses/scripts, and the data at the leaf is actually a root of the unspent-TxOut list for that address/script.  To maintain security of the tree signatures, it will be included in the header of an alternate blockchain, which will be secured by merged mining.  

This provides the same compression as the simpler unspent-TxOut merkle tree, but also gives nodes a way to download just the unspent-TxOut list for each address in their wallet, and verify that list directly against the blockheaders.  Therefore, even lightweight nodes can get full address information, from any untrusted peer, and with only a tiny amount of downloaded data (a few kB).  

(NOTE:  I have illustrated everything as using straight merkle-trees, but as noted in the downsides/uncertainties section: a variant of the merkle-tree will have to be to used that guarantees efficient updating of the tree.)


(1) Major Benefits:
  • (1a) Near-optimal blockchain compression:  theoretically, the size of the pruned blockchain would be proportional to the transaction volume (thus could go up or down), instead of the entire global history which always increases in size.  In practice, it wouldn't be so clean, but you really won't do any better than this.
  • (1b) Trustless lightweight-node support:  New nodes entering the network for the first time, will only have to download a tiny amount of data to get full, verifiable knowledge of their balance and how to spend it (much of which can be stored between loads).  A single honest peer out of thousands guarantees you get, and recognize, good data.
  • (1c) Perfectly non-disruptive:  There is no main-network protocol or blockchain changes at all.  All the balance-tree information is maintained and verified in a separate blockchain through merged mining.  In fact, it's so non-disruptive, it could be implemented without any core-dev support at all (though I/we would like their involvement)
  • (1d) Efficient tree querying&updating:  The full-but-pruned nodes of the network will be able to maintain this data structure efficiently.  New blocks simply add or remove unspent coins from the tree, and all operations are "constant time and space" (there is an upper limit on how much time and space is required to prove inclusion of, insert, or delete a piece of data, no matter how big the network is)
  • (1e) No user-setup or options:  Unlike overlay networks, achieving full trust does not require finding a trusted node, or subscribing to a service.  Just like the main blockchain -- you find a bunch of random peers and get the longest chain.  This could be bootstrapped in a similar fashion as the main network.

(2) Downsides and Uncertainties:
  • (2a) Complexity of concept:  This is not simple.  It's a second blockchain, requiring merged mining -- though if it is successful and supported by the community, it could be added to the network by requiring that miners compute and include the root hash of this data structure in the coinbase script (just like with block height).  This is entirely feasible, but it could be a bear to implement it.
  • (2b) Uncertainties about lite-node bootstrap data:  Depending on how the data is structured, there may still be a bit of a data for a lite node to download to get the full security of a full node.  It will, undoubtedly, be much less than downloading the entire chain.  But, there is obviously implications if this security comes at the cost of 1 MB/wallet, or 100 MB/wallet (still better than 4GB, as of this writing).  UPDATE: My initial estimate based on the "Hybrid PATRICIA/Brandais Tree" (aka Reiner-Tree), is that a wallet with 100 addresses could verify its own balance with about 250 kB.
  • (2c) [SEE UPDATE AT BOTTOM] Merkle-tree Alternative Needed: Vanilla merkle-trees will not work, because adding or removing single branches is likely to cause complete recomputation of the tree.  But it should be possible to create an alternative with the following properties:
    • Commutative computation:  a node should be able to get the same answer regardless of whether the tree is computed from scratch, or is based on updating a previous tree.
    • O(log(N)) updating: removing or adding a single leaf node should be able to be done in O(log(N)) time.  With a vanilla merkle tree, this is true only if you remove a node and add a node to the same leaf location.

(3)  Assumptions::
  • (3a) Need verifiable tree roots:  I argue that a regular overlay network won't suffice, solely because it's too easy for malicious nodes to spread incorrect data and muck up the network.  If there's enough malicious nodes in an overlay network, it could make lite nodes that depend on it unusable.  I am assuming it is necessary to have a verifiable source for pruned-headers -- a separate blockchain succeeds because correctness of data is required to be accepted.
  • (3b) Merged mining does what we think it does: It is a secure way to maintain a separate blockchain, leveraging existing mining power.  
  • (3c) Efficient sorting:  Leaf nodes of the main tree will have to be sorted so that all nodes can arrive at the same answer.  However, this can be done using bucket-sort in O(N) time, because the leaf nodes are hashes which should be uniformly distributed.



Alt-Chain Merkle Tree construction:

-- For each address/script, collect all unspent-TxOuts
-- Compute merkle root of each TxOut tree
-- Sort roots, use as leaf nodes for a master-merkle-tree.  
-- Include merkle-root of master tree in alternate chain header.





Getting your balance:

-- Download headers of both chains
-- Request unspent-TxOut-hash list.  
-- Compute sub-merkle root for this address
-- Request secondary-branch nodes  (O(log(N))
-- Compute master root; compare to block header
-- Request the full TxOuts for each unspent-TxOut-hash above





Alternate Chain:
All data is included on the alternate blockchain, which is maintained through merged mining on the main chain.  This is only one extra tx per block on the main chain.  That is the full extent of its impact on the main chain, and any nodes that are ignoring/unaware of the alt-chain.





Yes, this is a huge undertaking.  Yes, there's a lot of uncertainties. Yes, I need a new merkle tree structure.
But, this idea would kill two massive birds with one stone (kill two albatrosses with one block?)

Alright, tear it apart!




UPDATE:

After lots and lots of discussion and debate, I believe that the address index should be maintained as a trie-like structure.  Other's have expressed interest in a binary-search tree (BST).  Either way, the structure can be adapted to have the same properties we desire of a merkle tree, but with a lot more flexibility, such as very quick insertion, deletion, querying, updating, etc.  My preference is the creme-de-la-creme of tries -- a hybrid PATRICIA tree (level-compressed trie) and de-la-Braindais tree (node-compressed).  It looks something like this:



The structure would be indexed by TxOut script ("recipient"), and each node is recursively authenticated by the nodes below it.  The uniqueness of the trie structure guarantees that there is exactly one solution for a given set of TxOuts, which also means that only the existing set of TxOuts need to be obtained in order to create the trie (the BST requires replaying all transactions, in order, to have a well-defined internal structure).  For education on trie structures, see my pretty pictures in this post.

Have You seen my topic?  https://bitcointalk.org/index.php?topic=194471.0;topicseen  we should talk

I am poor, but i do work for Coin Smiley
1PtHcavXoakgNkQfEQdvnvEksEY2NvwaLM
ThomasV
Legendary
*
Offline Offline

Activity: 1896
Merit: 1343



View Profile WWW
May 11, 2013, 11:33:49 AM
 #294

I have started to experiment with this idea.
My goal is to add this hash tree to Electrum.

Each "numChildren" value (after the SumValue) can be exactly one byte, because you never have more than 256 ptrs, and each child pointer is also exactly 1 byte.  If you want to jump to a particular child, for instance, you are at node "11" and want to go the child at 3, you simply do iter->Seek("11"+"3") and it will skip "1122" and put the iterator right at "1137", which is the first database value >= "113".

Pointers can also be encoded as bits, using a fixed-size 32 bytes vector (assuming 256 pointers).
Of course variable-length storage would be more efficient, because most nodes will have sparse children, but I don't know if it is really worth the effort.
Indeed, keys will take up to 20 bytes, and node hashes will take 32 bytes anyway, so we're not adding an order of magnitude by using 32 bytes.


Quote
Furthermore, you might be able to get away without even any pointers!  You might just store the node/leaf hash and value, and know about children after the fact, simply by continuing your iteration.  You are at IterA, and IterB=IterA.Next().   You know that IterB is a child node of IterA because IterB.key().startswith(IterA.key()).   That's stupid simple.  

So, you know what level you're at simply by looking at Iter.size()
So, you know that you are a child because IterNext.key().startswith(IterPrev.key()).
If the previous check fails, you know you finished traversing that branch and you can update IterPrev.

Though, there may be something I'm missing that would still require you to store the pointers.  But it's still a lot better than storing 6-8 bytes per pointer, which was originally where I thought the bulk of the data was originally going to end up.

You can indeed do it without pointers, but iterating to find the children of a node can be very long.
And you will need to find the children of a node everytime you update its hash.



Electrum: the convenience of a web wallet, without the risks
Come-from-Beyond
Legendary
*
Offline Offline

Activity: 2142
Merit: 1009

Newbie


View Profile
May 11, 2013, 11:39:28 AM
 #295

After lots and lots of discussion and debate, I believe that the address index should be maintained as a trie-like structure.

It's possible to create a transaction that has no address at all. What is considered the address in this case?
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
May 11, 2013, 04:54:36 PM
 #296

I have started to experiment with this idea.
My goal is to add this hash tree to Electrum.

Each "numChildren" value (after the SumValue) can be exactly one byte, because you never have more than 256 ptrs, and each child pointer is also exactly 1 byte.  If you want to jump to a particular child, for instance, you are at node "11" and want to go the child at 3, you simply do iter->Seek("11"+"3") and it will skip "1122" and put the iterator right at "1137", which is the first database value >= "113".

Pointers can also be encoded as bits, using a fixed-size 32 bytes vector (assuming 256 pointers).
Of course variable-length storage would be more efficient, because most nodes will have sparse children, but I don't know if it is really worth the effort.
Indeed, keys will take up to 20 bytes, and node hashes will take 32 bytes anyway, so we're not adding an order of magnitude by using 32 bytes.

Quote
Furthermore, you might be able to get away without even any pointers!  You might just store the node/leaf hash and value, and know about children after the fact, simply by continuing your iteration.  You are at IterA, and IterB=IterA.Next().   You know that IterB is a child node of IterA because IterB.key().startswith(IterA.key()).   That's stupid simple.  

So, you know what level you're at simply by looking at Iter.size()
So, you know that you are a child because IterNext.key().startswith(IterPrev.key()).
If the previous check fails, you know you finished traversing that branch and you can update IterPrev.

Though, there may be something I'm missing that would still require you to store the pointers.  But it's still a lot better than storing 6-8 bytes per pointer, which was originally where I thought the bulk of the data was originally going to end up.

You can indeed do it without pointers, but iterating to find the children of a node can be very long.
And you will need to find the children of a node everytime you update its hash.

My point was you don't need any pointers at all, and finding the children isn't actually that long since the database is efficient at these kinds of operations.  If you are node "ABCD" and want to go to pointer P, you don't need a pointer to know how to get there.  Just iter->Seek("ABCDP") and you'll end up at the first elemtent equal to or greater than it.  At the deeper levels, the iterators will efficiently seek directly in front of themselves, and may already have your next target in cache already. 

If it starts with "ABCD" you know you are still in a child of ABCD, and if not, you know you are in a parallel branch and can finish processing the "ABCD" node.  Yes, there may be a lot of seek operations, but with the built-in optimizations, there's a very good chance that they will be fast, and because it's a PATRICIA tree, you'll rarely be doing more than 6 such operations to get the branch updated. 

On the other hand, I haven't thought this through thoroughly.  I only know that it seems like you can avoid the pointers altogether which I was expecting to make up the bulk of the storage overhead.  i.e. each node currently will only hold a sum (8 bytes) and its own hash (32 bytes).  If you need the pointers, you could end up 256, 8-byte pointers per node in addition to it, which is actually quite heavy at the higher, denser levels. 

After lots and lots of discussion and debate, I believe that the address index should be maintained as a trie-like structure.

It's possible to create a transaction that has no address at all. What is considered the address in this case?

There's a liitle room for negotation on this topic, but ultimately and "address" is a TxOut script.  In a totally naive world, your "addresses" would just be the exact serialization of the TxOut script -- so a 25-byte "address" for each standard, Pay2Hash160 script.  Or 35 or 67 for pay-to-public-key scripts.    23 bytes for a P2SH script.  And then anything that is non-standard would be simply serialized raw.

However, I don't like this, because a single address ends up with multiple equivalent representation.  Even though pay-to-public-key scripts are rare, there are addresses that use both (such as multi-use addresses that were used for mining and regular transactions).  Even though it's rare, you'd have to ask your peers for 2 different scripts per address (the Pay2Hash160 and PayToPubKey scripts).  I'd almost prefer making special cases for these addresses, given that they are so standard and fundamental to Bitcoin transactions.

So, I would vote for:

{Pay2Hash160, Pay2PubKey65, Pay2PubKey33} all be serialized as 21 bytes:  0x00 + Hash160.  Any Pay2PubKey variants will be bundled under that single key.
{P2SH} scripts will be serialized as 21 bytes:  0x05 + Hash160{script}. 
{EverythingElse} Will simply be the raw script. 

One problem I see with this is that it doesn't make it clean to adopt new standard scripts, without reconstructing the database in the future.  I suppose it wouldn't be the end of the world, but we also don't want to make an inflexible protocol decision.  This isn't just personal preference for storing address/scripts, it's actually describing the authenticated structure of the Reiner-tree.  So if we would be adding a new std script type, and we'd want a short form of it to store in the DB, we'd have to update the "protocol".  If this had been adopted already, that would be a hard fork.   If we just do raw scripts all around, this isn't really a problem, except that we may have to ask for extra branches to make sure we get all possible variants of a single public key.


@ ThomasV

I noticed you asked something about "SumValue" before.  I don't know if you got the question answered, but the idea was to recursively store the sum-of-value of each sub-branch, and have it authenticated along with the hashes.  Quite a few users, including gmaxwell (who was original only luke-warm on this whole idea), determined that was an extremely valuable addition to this spec to deal with miners who lie about their reward knowing that the network is made up almost entirely of lite-nodes who have no way to determine otherwise.  But those lite nodes know what the total coins in circulation should be, and thus would only have to look at the root-sum-value to determine if someone cheated. 

I don't know if I completely captured that concept.  I'm sure someone like gmaxwell or d'aniel can jump in and explain it better.  But it is an easy add-on to the original idea.  And also makes it possible to simply query your balance without downloading all the raw TxOuts (though, if you are using each address once, that doesn't actually save you a lot).


Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1011


View Profile
May 12, 2013, 03:10:44 PM
 #297

So, I would vote for:

{Pay2Hash160, Pay2PubKey65, Pay2PubKey33} all be serialized as 21 bytes:  0x00 + Hash160.  Any Pay2PubKey variants will be bundled under that single key.
{P2SH} scripts will be serialized as 21 bytes:  0x05 + Hash160{script}. 
{EverythingElse} Will simply be the raw script. 

One problem I see with this is that it doesn't make it clean to adopt new standard scripts, without reconstructing the database in the future...

Why not hash160(txout.scriptPubKey)? I had assumed from the beginning that's what we'd be doing. "Addresses" are a UI issue - the protocol should only concern itself with scripts.

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
ThomasV
Legendary
*
Offline Offline

Activity: 1896
Merit: 1343



View Profile WWW
May 12, 2013, 04:16:44 PM
 #298

So, I would vote for:

{Pay2Hash160, Pay2PubKey65, Pay2PubKey33} all be serialized as 21 bytes:  0x00 + Hash160.  Any Pay2PubKey variants will be bundled under that single key.
{P2SH} scripts will be serialized as 21 bytes:  0x05 + Hash160{script}. 
{EverythingElse} Will simply be the raw script. 

One problem I see with this is that it doesn't make it clean to adopt new standard scripts, without reconstructing the database in the future...

Why not hash160(txout.scriptPubKey)? I had assumed from the beginning that's what we'd be doing. "Addresses" are a UI issue - the protocol should only concern itself with scripts.

+1
this is also what I have assumed

Electrum: the convenience of a web wallet, without the risks
ThomasV
Legendary
*
Offline Offline

Activity: 1896
Merit: 1343



View Profile WWW
May 12, 2013, 04:22:40 PM
 #299

My point was you don't need any pointers at all, and finding the children isn't actually that long since the database is efficient at these kinds of operations.  If you are node "ABCD" and want to go to pointer P, you don't need a pointer to know how to get there.  Just iter->Seek("ABCDP") and you'll end up at the first elemtent equal to or greater than it.  At the deeper levels, the iterators will efficiently seek directly in front of themselves, and may already have your next target in cache already.  

If it starts with "ABCD" you know you are still in a child of ABCD, and if not, you know you are in a parallel branch and can finish processing the "ABCD" node.  Yes, there may be a lot of seek operations, but with the built-in optimizations, there's a very good chance that they will be fast, and because it's a PATRICIA tree, you'll rarely be doing more than 6 such operations to get the branch updated.  

no, you need to know the list of children in order to compute the hash of a node.
if you don't store pointers at all, you'll need to perform 256 iter.seek() and iter.next() operations per node, only to know its list of children

Quote
On the other hand, I haven't thought this through thoroughly.  I only know that it seems like you can avoid the pointers altogether which I was expecting to make up the bulk of the storage overhead.  i.e. each node currently will only hold a sum (8 bytes) and its own hash (32 bytes).  If you need the pointers, you could end up 256, 8-byte pointers per node in addition to it, which is actually quite heavy at the higher, denser levels.  

you only need 1 bit per pointer (true iff a child node exists), that's 32 bytes.

Electrum: the convenience of a web wallet, without the risks
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
May 12, 2013, 04:33:49 PM
Last edit: May 12, 2013, 05:00:14 PM by etotheipi
 #300

My point was you don't need any pointers at all, and finding the children isn't actually that long since the database is efficient at these kinds of operations.  If you are node "ABCD" and want to go to pointer P, you don't need a pointer to know how to get there.  Just iter->Seek("ABCDP") and you'll end up at the first elemtent equal to or greater than it.  At the deeper levels, the iterators will efficiently seek directly in front of themselves, and may already have your next target in cache already.  

If it starts with "ABCD" you know you are still in a child of ABCD, and if not, you know you are in a parallel branch and can finish processing the "ABCD" node.  Yes, there may be a lot of seek operations, but with the built-in optimizations, there's a very good chance that they will be fast, and because it's a PATRICIA tree, you'll rarely be doing more than 6 such operations to get the branch updated.  

no, you need to know the list of children in order to compute the hash of a node.
if you don't store pointers at all, you'll need to perform 256 iter.seek() and iter.next() operations per node, only to know its list of children

I don't think so.  If you are at ABCD and it has only 3 children, "ABCDE" "ABCDP" and "ABCDZ", there's still only 3 seeks.  You seek for "ABCDA", and the iterator ends up at ABCDE (which is the first element equal-to-or-greater-than your seek value).  So you know that's the first child, and that there's no point in seeking for "ABCDB", "ABCDC" etc.  Then your next seek is "ABCDF", which puts you at "ABCDP".  Rinse and repeat.  


Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
ThomasV
Legendary
*
Offline Offline

Activity: 1896
Merit: 1343



View Profile WWW
May 12, 2013, 05:01:55 PM
 #301

I don't think so.  If you are at ABCD and it has only 3 children, "ABCDE" "ABCDP" and "ABCDZ", there's still only 3 seeks.  You seek for "ABCDA", and the iterator ends up at ABCDE (which is the first element equal-to-or-greater-than your seek value).  So you know that's the first child, and that there's no point in seeking for "ABCDB", "ABCDC" etc.  Then your next seek is "ABCDF", which puts you at "ABCDP".  Rinse and repeat.
oh indeed, I did not see that. thank you

Electrum: the convenience of a web wallet, without the risks
lunarboy
Hero Member
*****
Offline Offline

Activity: 544
Merit: 500



View Profile
May 13, 2013, 04:58:00 PM
 #302

Fascinating. Logical. Influential...... Open source development such a privilege to watch.
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1011


View Profile
May 13, 2013, 08:36:10 PM
 #303

Since @etotheipi is occupied for the next half-year and since I have a large interest in this proposal, I am offering my services to help make it happen. I have created a new thread with specific details of my proposal:

https://bitcointalk.org/index.php?topic=204283.msg2135237#msg2135237

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
2112
Legendary
*
Offline Offline

Activity: 2128
Merit: 1060



View Profile
May 14, 2013, 05:19:45 AM
 #304

I originally wrote this post couple of days ago in response to another humorous/pithy comment. However the original comment got deleted before I finished editing it and it made almost no sense afterwards. It appears that this thread is now approaching its end and I decided to post re-edited version of it to put it in a permanent public record.

The original Greenspun's tenth rule of programming states:
Quote
Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp.
I propose a modified version of it pertaining to Bitcoin and other blockchain-based cryptocurrencies:
Quote
Any sufficiently complete attempt at implementing Bitcoin will contain an ad hoc, informally-specified, bug-ridden, slow implementation of half of MUMPS.

I'm trying to suggest that MUMPS should be used to implement Bitcoin. I'm just observing that there will be tremendous amount of work expended to re-invent and re-implement one key feature of MUMPS: larger-than-core sparse hierarchical tree storage with all the expected ACID properties.

For those who are interested why the technology from circa 1975 outperforms all previous attempts at blockchain storage (both full and pruned) I have the following links:


It is a shame that I don't know of any open-source software that is compatible with the MIT license and other requirements specific to Bitcoin.

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0
Peter Todd
Legendary
*
expert
Offline Offline

Activity: 1120
Merit: 1147


View Profile
May 14, 2013, 05:33:11 AM
 #305

In case anyone misses 2112's remarkably obscure troll with regard to MUMPS, see this: http://thedailywtf.com/Comments/A_Case_of_the_MUMPS.aspx

Sukrim
Legendary
*
Offline Offline

Activity: 2618
Merit: 1006


View Profile
May 14, 2013, 10:33:22 AM
 #306

I might start mining again if this gets implemented! Smiley
Also as long as there is a UTXO block merge mined every few hours or even days this is still much better/faster than the current situation with checkpoints + the bootstrap.dat torrent.

By the way, thanks for not rushing this and debating about proper solutions longer as this is in my opinion one thing that can drive forward bitcoin adoption a LOT more than another porn site accepting them! The frustration and confusion when starting a full client for the first time for sure is one of the major reasons why this is still seen as a "geek tool". It is important to get this right on the first try.

https://www.coinlend.org <-- automated lending at various exchanges.
https://www.bitfinex.com <-- Trade BTC for other currencies and vice versa.
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1011


View Profile
May 15, 2013, 06:58:08 PM
 #307

I don't know all the details of the proposals, but let me explain it the way I see it and then you people tell me what I'm missing.
I propose "trust levels" as the name. For simplicity, I'm assuming the root of the UTXO tree is part of the headers, not just merged mined, but we can argue about that later.

Trust level 2: The current Simplified payment verification.

Trust level 1: We define a distance in blocks to the past which is secure for the node to assume won't suffer a reorg, say 1000 blocks ago and call it the "pseudo-checkpoint". He downloads the full UTXO tree that was in block (currentHeight-1000), the UTXO tree in the pseudo-checkpoint. He downloads 1000 blocks and reproduces the current UTXO tree from the the old one. For a given unspent output, he can provide its full merkle branch until coinbase OR the pseudo-checkpoint if the coinbase is older.

Trust level 0: The node downloads the rest of the chain, being able to verify that the UTXO tree from the pseudo-checkpoint he used was correct, and he will be able to always provide full output markle branches to their coinbases from now on.

You could download the whole chain from the top to the bottom and optimistically start assuming the last UTXO tree was legit, then validate it with the previous one and the transactions of last block, and so on to the genesis. Effectively decreasing the pseudo-checkpoint height from last block created to genesis.
Once you've achieved trust 0, nothing forces you to store the whole chain. You can then set another pseudo-checkpoint, only being afraid of reorgs and/or not being able to provide long enough output chains.
In fact, you can go to trust level 1 or even trust level 0 to operation level 2 if you want. You can download the whole chain but then only keep the branches of your own outputs.

Nodes would have more room for specialization. I think there will be always "librarian nodes" ala block explorer. distributing the whole history. But I think their lack is what worries @Sukrim.

After this I think you can only improve the storage of the UTXO (and posibly their full merkle brach) using caches.

Is this basically what we're talking about or am I lost and the name "trust levels" is awful?

@jtimon, you are essentially correct although what you describe is only part of the story. I think “trust level” is an appropriate term. Here's how I would lay them out:

Level 4: Electrum-like client/server. Keys are stored on the client, the but the client trusts the server for information about the unspent-TxOut set. This is only marginally better than sticking your coins on a server-side wallet. I would never make a general recommendation to operate at this level unless you own both the server and the client and have a secure, authenticated connection between the two.

Level 3: Simplified payment verification. Client trusts the “longest” (most work) chain, but reads and processes every block. The client must scan the contents of every block more recent than the oldest address in its wallet in order to be sure that its record of unspent wallet outputs is correct. BitcoinJ has some optimizations not mentioned, but only because they make simplifying assumptions about the circumstances under which wallet transactions might be generated. It remains true that you must touch every transaction of every block that might have a wallet input or output within it.

Both of the above levels will be completely obsoleted by this proposal.

Level 2: The client downloads the “longest” (most work) meta-chain, and retrieves the block of data associated with the current head of the meta-chain. This data includes the Merkle hash of the unspent-TxOut trie and it's associated block height. The client then queries any node exposing the correct service bits about its wallet addresses, retrieving either the associated transaction outputs with proof-of-inclusion, or a negative proof showing that no such output exists in the trie. I call this enhanced simple payment verification, or SPV+, and operates trust-free at an economic security level equal to the hash power of the merged-mined meta-chain.

Level 1: The meta-chain data block probably will include other information, such as a deterministic bittorrent infohash of the serialized unspent-TxOut trie and blockchain checkpoint data. The client downloads the unspent-TxOut trie torrent and verifies that its root hash matches what was in the meta-chain. It then reconstructs the information necessary to do full-block validation from that point forward. The initial synchronization is at the meta-chain level of economic security, but after that it would take a 51% attack on bitcoin itself to subvert a client at this level.

Level 0: The client either verifies the entire chain forwards from the genesis block, or backwards up to the most recent checkpoint as @jtimon described. It is now a fully-validated client operating exactly as the Satoshi client does today, and with the same level of security.

There is also a “0.5” mode that might be good enough for most people: only verify backwards long enough to satisfy your own security requirements (a configurable number of blocks, perhaps as a command line parameter).

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
socrates1024
Full Member
***
Offline Offline

Activity: 126
Merit: 108


Andrew Miller


View Profile
May 21, 2013, 10:41:33 PM
 #308

IMO your 3-point plan is missing out on the best aspect of having a Merkle UTXO. It's unnecessary for Level 2 to be a weaker security "SPV+", it can still do Full Validation even without having to do download the whole chain. The reason why is that you can check EVERY TX IN A BLOCK (not just ones involving your wallet) just by knowing the root hash of the UTXO and requesting short checkable proofs from untrusted nodes.

At one point I convinced etotheipi of how this works, and he basically said he hadn't realized that it would be possible. https://bitcointalk.org/index.php?topic=101734.0 I made a reference implementation using a redblack tree but it would be totally fine to substitute a merkle patricia trie for it.

amiller on freenode / 19G6VFcV1qZJxe3Swn28xz3F8gDKTznwEM
[my twitter] [research@umd]
I study Merkle trees, credit networks, and Byzantine Consensus algorithms.
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1011


View Profile
May 22, 2013, 12:07:24 AM
 #309

I'm not doubting that it's possible, just wondering if you'd actually be gaining anything. I suppose the application would be for devices with suitable bandwidth but very tight memory constraints, like hardware wallets?

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
May 22, 2013, 05:21:37 AM
 #310

I'm not doubting that it's possible, just wondering if you'd actually be gaining anything. I suppose the application would be for devices with suitable bandwidth but very tight memory constraints, like hardware wallets?

If I understand (remember) correctly, Socrates1024 is making the point that someone could theoretically do mining without having the full blockchain.  All they need is the root hash of the last block, and to see the full blocks coming in.  This is because you can verify that the UTXOs being spent by the new transactions are unspent with a simple branch request from a peer.

Right now, if you wanted to do this... you can't.  You can easily prove that a UTXO once existed, but you don't know it was spent since then without downloading all the blocks since then and verifying no spends.  But with this UTXO tree structure, you can prove both inclusion and exclusion of UTXOs on any given block.  It may not be that big of a distinction today, but perhaps in the future when you require TB of storage, that could make a difference. 

But I do question how much is gained -- since only some upper limits ratio of the network could operate like this without being extremely burdensome on the actual full nodes.  And I'm not confident that these lite nodes could expect reliable branch information from each other even if they all "agreed" to hold, say, 1% of the full thing.  I certainly wouldn't want to risk my miner becoming idle because it's having trouble finding some subset of the tree (or large subsets of the network going dark for that reason)

Have I understood this properly?



One thing that has come up before, is that I'd like to see an additional piece of information added to the header:  fast-forward data each block.  sipa has already implemented undo data because you need it in the event of a reorg, and I tried to convince him he might as well include fast-forward data, because you save some 75% bandwidth on transmission of block data (to those that request it).  If the OutPoints-removed-and-TxOuts-added data is organized into a merkle tree, then that root could be included in the merkle tree, and such UTXO-only nodes would be able to avoid pulling whole blocks.  sipa didn't like the complexity for a "constant" factor, but I think 75% is worthy constant factor.  Unless I'm missing something...

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
TierNolan
Legendary
*
Offline Offline

Activity: 1232
Merit: 1077


View Profile
May 22, 2013, 09:58:24 AM
 #311

If I understand (remember) correctly, Socrates1024 is making the point that someone could theoretically do mining without having the full blockchain.  All they need is the root hash of the last block, and to see the full blocks coming in.  This is because you can verify that the UTXOs being spent by the new transactions are unspent with a simple branch request from a peer.

I think providing a way to pre-package blocks would help here.  You send the block you are mining in advance (subject to spam protection).

Another option is to allow transactions to be packaged.  You send a hash + 64 transactions.  Later, you can include all 64 just with the header hash.

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF
socrates1024
Full Member
***
Offline Offline

Activity: 126
Merit: 108


Andrew Miller


View Profile
May 22, 2013, 01:09:59 PM
 #312

I'm not doubting that it's possible, just wondering if you'd actually be gaining anything. I suppose the application would be for devices with suitable bandwidth but very tight memory constraints, like hardware wallets?

Sure, hardware wallets are an example. You could plug it in to an untrusted host computer that has already downloaded the proof data. Or even a mobile phone wallet, where you don't have much bandwidth during the day so it's SPV, but maybe it could sync up with the network overnight when it's around an untrusted wifi.

amiller on freenode / 19G6VFcV1qZJxe3Swn28xz3F8gDKTznwEM
[my twitter] [research@umd]
I study Merkle trees, credit networks, and Byzantine Consensus algorithms.
LvM
Full Member
***
Offline Offline

Activity: 126
Merit: 100


View Profile
May 22, 2013, 11:57:04 PM
Last edit: May 23, 2013, 01:03:33 PM by LvM
 #313


@etotheipi:
Is it REALLY impossible for each client
"to download just the unspent-TxOut list for each address in their wallet, and verify that list directly against the blockheaders."
...quickly and easily without your proposed complications with alt/meta-chains ?

I cannot believe that. If so, this seems besides the "Cash/Change" concept*) another remarkable bug at the lowest level.

*) CASH/CHANGE simulation vs. GAAP fundamentals
https://bitcointalk.org/index.php?topic=211835

BTC violates GAAP, result a MESS  https://bitcointalk.org/index.php?topic=211835.0
Anforderungen an eine PROFESSIONELLE BTC-Anwendung https://bitcointalk.org/index.php?topic=189669
BANKGEHEIMNIS mit BTC gleich NULL!? https://bitcointalk.org/index.php?topic=188383 Antwort: Ja, wenn man nicht höllisch aufpaßt.
d'aniel
Sr. Member
****
Offline Offline

Activity: 461
Merit: 251


View Profile
May 23, 2013, 12:36:32 AM
Last edit: May 23, 2013, 01:17:57 AM by d'aniel
 #314

I'm not doubting that it's possible, just wondering if you'd actually be gaining anything. I suppose the application would be for devices with suitable bandwidth but very tight memory constraints, like hardware wallets?

Sure, hardware wallets are an example. You could plug it in to an untrusted host computer that has already downloaded the proof data. Or even a mobile phone wallet, where you don't have much bandwidth during the day so it's SPV, but maybe it could sync up with the network overnight when it's around an untrusted wifi.
The big value of validation-without-the-blockchain seems to me to be when it's combined with the fraud proof/challenge idea, where an SPV node can reject invalid blocks upon receiving a short proof or challenge that proves the block was formed incorrectly (e.g. nonexistent txin, double spend, invalid coinbase).   It turns out all important cases of fraudulently formed blocks can be covered by these - including cheating on the block reward, after a simple change to the way the Merkle trees are constructed.  This would greatly increase the trust model of an SPV node, under the assumption that it's connected to at least one well-connected, honest peer.

The more honest nodes there are auditing blocks and forming these fraud proofs/challenges, the better chance SPV nodes will promptly and reliably receive them.  This is where the validation-without-the-blockchain idea really shines IMO, since it allows resource constrained nodes to contribute to overall network security, but without biting off more than they can chew - they only partially validate blocks.  I tend to think of this as the "correct" way to preserve decentralization while scaling up block sizes.
jtimon
Legendary
*
Offline Offline

Activity: 1372
Merit: 1002


View Profile WWW
May 23, 2013, 11:42:21 AM
 #315

@maaku, you said yesterday we left an open conversation more or less here:

maaku: Metachain miners could get lazy and not maintain the indexes properly.
jtimon: If the indexes are in the main chain there wouldn't be such a problem.
maaku: In the main chain the thing gets worse.
jtimon: How that's possible? blocks with wrong indexes will be orphaned.

Also, about socrates proposal. I think it's also useful for custom assets that can be issued, used and destroyed. New miners (or users) don't really care about these destroyed assets.

@LvM
I think the purpose of the "change" approach is to make obfuscation easier. You could do GAAP and maintain a sequence number for the transactions from each account like Ripple does.
You claim that would reduce the size of the chain by half. But how do you know how many keypairs will people create to obfuscate their finances?
Another problem is that you have to maintain a list of "existent accounts" forever. If miners forget about them and then the users uses it again (not a good idea on his part), the sequence for that keypair would be reset and anyone can "replay" the old transactions if he has them. That's why you can't destroy accounts (funded keypairs) in Ripple. You may say "there's no replay, all BTC funds will be gone". Well, I'm thinking about people reissuing custom assets from "resurrected" accounts.
Well, probably we should go to your thread to discuss this anyway...

2 different forms of free-money: Freicoin (free of basic interest because it's perishable), Mutual credit (no interest because it's abundant)
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1011


View Profile
May 23, 2013, 04:13:45 PM
 #316

@maaku, you said yesterday we left an open conversation more or less here:

maaku: Metachain miners could get lazy and not maintain the indexes properly.
jtimon: If the indexes are in the main chain there wouldn't be such a problem.
maaku: In the main chain the thing gets worse.
jtimon: How that's possible? blocks with wrong indexes will be orphaned.

Actually we should let gmaxwell respond to this, as I was really just relaying his concerns expressed to me at the conference. I think perhaps he is considering the case where the information is included in the bitcoin coinbase, but not actually enforced as a protocol rule?

Also, about socrates proposal. I think it's also useful for custom assets that can be issued, used and destroyed. New miners (or users) don't really care about these destroyed assets.

True except for the miner part. You can fork the blockchain just as easily with invalid asset transactions.

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
jtimon
Legendary
*
Offline Offline

Activity: 1372
Merit: 1002


View Profile WWW
May 26, 2013, 02:37:43 PM
 #317

Actually we should let gmaxwell respond to this, as I was really just relaying his concerns expressed to me at the conference. I think perhaps he is considering the case where the information is included in the bitcoin coinbase, but not actually enforced as a protocol rule?

I'm really interested. If the indexes are enforced by the protocol I see no problem.

Also, about socrates proposal. I think it's also useful for custom assets that can be issued, used and destroyed. New miners (or users) don't really care about these destroyed assets.

True except for the miner part. You can fork the blockchain just as easily with invalid asset transactions.

I don't understand.
If I validate the full chain leaving out the ones that nobody says still exist and only download the hashes of the transactions involved...
Never mind. Once a death asset appears with living assets in the same transaction you have also to validate the full asset, recursively.
So probably miners need to validate death assets anyway, am I thinking this straight or did I got lost on recursion?

Anyway, asking for the full proofs on the whole current UTXO would let you mine at level 1.

1) You ask for the longest header chain.

2) You ask for the UTXO

3) You ask for the full proofs of all outputs, not just yours. Maybe starting with yours.

4) Maybe in parallel, or only with wifi, or whatever, you download the whole chain from the in-chain signed torrent.
You validate everything until "your checkpoint" independenlty.

What happens when there's a reorg?

1) You change "your checkpoint" to the last common block so you ask for the that block's UTXO

2) You ask for the new UTXO at the top

3) You ask for the missing proofs you lack for the new UTXO.

Hummh, maybe GAAP is also better for reorgs...
And in fact I already solved the "immortal accounts" on the new Ripple forum by substituting the sequence number for the hash of the previous transaction from that account.
LvM is right, GAAP and a ledger instead of the UTXO like Ripple would make things much simpler.
Is there any obvious arguments in favor of the change-like design I am missing?
Is there anything that makes the UTXO cooler than a simple ledger?
Anyway, that's too big of a change, but since nobody answered in LvM's thread and I'm doubting...

2 different forms of free-money: Freicoin (free of basic interest because it's perishable), Mutual credit (no interest because it's abundant)
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1011


View Profile
May 26, 2013, 04:28:33 PM
 #318

I got confused following your logic, but the conclusion is right. Miners must validate everything or the whole system falls apart.

As for ledger vs. UTXO, these UTXO indices give the advantages of a ledger (being able to look up final balances, for example), while maintaining an underlying block structure. But that connection to the underlying system is important - you still need it if you are going to create new transactions, for example. Does that make sense?

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
aaaxn
Sr. Member
****
Offline Offline

Activity: 359
Merit: 250



View Profile
May 26, 2013, 08:04:42 PM
 #319

Another problem is that you have to maintain a list of "existent accounts" forever. If miners forget about them and then the users uses it again (not a good idea on his part), the sequence for that keypair would be reset and anyone can "replay" the old transactions if he has them.
You can simply use block number of last account operation. There is no way you get same sequence twice this way.

As for ledger vs. UXTO I think all this concept of outputs and scripts doesn't make sense and should be dropped and replaced with account balances and account types (EG. multisig accounts)


                                                                              █
                              █████████                  ██████ 
                      ███████████████████████████   
              ███████████████████████████████   
            ████████████████████████████████   
        █████████████████████████████████     
    ████████████████████████████████████   
    ████████          █████████          █████████   
  ████████                ██████              ████████   
█████████                █████                ████████   
███████████                █                ███████████ 
██████████████                      ██████████████ 
█████████████████            ████████████████ 
███████████████                  ███████████████ 
█████████████                          █████████████ 
███████████              ███                ██████████ 
█████████                █████                ████████   
  ████████              ███████              ███████     
    █████████        █████████          ████████     
      █████████████████████████████████       
        ██████████████████████████████           
            ███████████████████████████             
              ████████████████████████                 
                  ████████████████████                     
CorionX


















Powered by,
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1011


View Profile
May 26, 2013, 08:10:43 PM
 #320

Another problem is that you have to maintain a list of "existent accounts" forever. If miners forget about them and then the users uses it again (not a good idea on his part), the sequence for that keypair would be reset and anyone can "replay" the old transactions if he has them.

You can simply use block number of last account operation. There is no way you get same sequence twice this way.

As for ledger vs. UXTO I think all this concept of outputs and scripts doesn't make sense and should be dropped and replaced with account balances and account types (EG. multisig accounts)

What doesn't make sense?  Outputs and scripts is how bitcoin works.

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
aaaxn
Sr. Member
****
Offline Offline

Activity: 359
Merit: 250



View Profile
May 26, 2013, 08:15:42 PM
 #321

What doesn't make sense?  Outputs and scripts is how bitcoin works.
Well it is how bitcoin works, but if you consider making major changes anyway (I was referring to jtimon post) I think this should be changed too because it is not optimal system architecture.


                                                                              █
                              █████████                  ██████ 
                      ███████████████████████████   
              ███████████████████████████████   
            ████████████████████████████████   
        █████████████████████████████████     
    ████████████████████████████████████   
    ████████          █████████          █████████   
  ████████                ██████              ████████   
█████████                █████                ████████   
███████████                █                ███████████ 
██████████████                      ██████████████ 
█████████████████            ████████████████ 
███████████████                  ███████████████ 
█████████████                          █████████████ 
███████████              ███                ██████████ 
█████████                █████                ████████   
  ████████              ███████              ███████     
    █████████        █████████          ████████     
      █████████████████████████████████       
        ██████████████████████████████           
            ███████████████████████████             
              ████████████████████████                 
                  ████████████████████                     
CorionX


















Powered by,
LvM
Full Member
***
Offline Offline

Activity: 126
Merit: 100


View Profile
May 26, 2013, 09:17:23 PM
Last edit: May 26, 2013, 09:33:35 PM by LvM
 #322



I think its important that everybody understand this and I didn't see anyone explaining it in general terms.

The Blockchain is a distributed computer file system containing a double entry accounting ledger. Each transaction has two sides which you may be familiar with from accounting: input and output OR debits and credits. However, a major difference is that bitcoin forces a debit(input) to exist for every credit(output).  Storing all of this takes a lot of space. extra explaination

This proposal will continously "balance the books." In accounting, when you close out the books for the quarter all of the debits and credits are summed, and the difference between the two is entered as a "balance" transaction. Because we know that bitcoin forces every credit(output) to have a debit(input), we only have to keep track of all credits(outputs) that are not claimed by a debit(input) to obtain the balance of each address.

The proposal is for a system to store the references to these unspent outputs in a data structure for quick downloading. It doesn't suggest how this tree would be updated efficiently, or how you would quickly grab all of the unspent outputs belonging to one address. This is under discussion.

@galambo
Yes. Its really important that everybody understands what you write about accounting!

But the Bitcoiners don't know anything about accounting, nothing about the balance of money payed and received. So they ignored from the very beginning all basic GAAP obstinately.

Instead all is filled with and buried behind the jargon and lingo of cryptologists, i.e. logically secondary questions.

They don't even know how to manage the Size of the ever growing BTC blockchain in the future  
https://bitcointalk.org/index.php?topic=109467.0;all

Quote
In accounting, when you close out the books for the quarter all of the debits and credits are summed, and the difference between the two is entered as a "balance" transaction.

Correct! That's also called "carry-over".

Carry-overs need just ONE entry to transfer the last balance of each account into a new database,
leaving all previous (thousands or millions!) transactions in an "archiv"
(where there can easily be found if really needed - normally almost never again).

Normally this year-end closing and "carry-overs" are done annually (and legally prescribed).

In BTC this could be done more often, i.e. always if there are "too many" transactions in the database making it too large for effective usage by clients.

Using this everywhere quite normal way of bookkeeping the actual "blockchain" could easily be kept small and handy and would not be bloated ad infinitum.

Impossible, complete nonsens, to keep all transfers over say 5, 10, 100, 200... years in ONE always transferred and read database.
But I see in BTC nowhere any usable solution for this from the very inception foreseeable issue.

For a correct and efficient GAAP-conform accounting only the (slightly changed) transaction records are needed
Details decribed here: https://bitcointalk.org/index.php?topic=211835

"Blocks"

Another, but smaller problem are the "blocks" and their "miners" wanting more and more undefined amounts of money also for transactions.

First designed to produce new BTC the creation of blocks indeed is thought to continue ad infinitum!! Even if all 21 Mio BTC are created.
Might be due to the present but in the long run totally unusable database structure,
not easy but unavoidable to change!!

BTC violates GAAP, result a MESS  https://bitcointalk.org/index.php?topic=211835.0
Anforderungen an eine PROFESSIONELLE BTC-Anwendung https://bitcointalk.org/index.php?topic=189669
BANKGEHEIMNIS mit BTC gleich NULL!? https://bitcointalk.org/index.php?topic=188383 Antwort: Ja, wenn man nicht höllisch aufpaßt.
LvM
Full Member
***
Offline Offline

Activity: 126
Merit: 100


View Profile
May 26, 2013, 09:21:57 PM
 #323

@Jtimon, aaan

Ripple indeed seems (at least as they describe it) to be MUCH better structured than BTC.
Could be taken as model or prototype for a new BTC!!

Like any other accounting system Ripple also does not need and have these "blocks" and their "miners".
They instead produced all their 100 billion Ripples aka XRP in advance and kept them for themselves.

"The Ripple founders created the initial Ripple ledger with 100 billion XRP. The founders gifted a for profit company called Opencoin 80 billion XRP. Opencoin intends to give away over 50 billion XRP. The remainder will be used to fund Opencoin operations, which include contributing code to the open source network and promoting the network."

https://ripple.com/wiki/Introduction_to_Ripple_for_Bitcoiners

But that's also just another reason why I do not really trust the Ripplers in the moment.
See also glitch003: https://bitcointalk.org/index.php?topic=179677.0;all

BTC violates GAAP, result a MESS  https://bitcointalk.org/index.php?topic=211835.0
Anforderungen an eine PROFESSIONELLE BTC-Anwendung https://bitcointalk.org/index.php?topic=189669
BANKGEHEIMNIS mit BTC gleich NULL!? https://bitcointalk.org/index.php?topic=188383 Antwort: Ja, wenn man nicht höllisch aufpaßt.
LvM
Full Member
***
Offline Offline

Activity: 126
Merit: 100


View Profile
May 26, 2013, 09:26:07 PM
Last edit: May 26, 2013, 09:52:54 PM by LvM
 #324

So we cannot but hope, that BTC will very soon be improved basically.

As BTC newbie having still enormous problems to understand the really weird details I cannot help to rewrite the code, sorry.

But an expert like Etotheipi (author of Armory, the best BTC client at all) certainly could do it almost alone
(instead of wasting his precious time with his proposed workarounds for clients only).

The main problem for experts seems IMHO not the code, but the "community",
esp. "markets" like MtGox etc. and services like blockchain.info, blockexplorer.com... having to readjust their codes accordingly.

But the new GAAP-conform code would be much shorter and clearer than the present one, a clumsy mess already at the lowest level.

BTC violates GAAP, result a MESS  https://bitcointalk.org/index.php?topic=211835.0
Anforderungen an eine PROFESSIONELLE BTC-Anwendung https://bitcointalk.org/index.php?topic=189669
BANKGEHEIMNIS mit BTC gleich NULL!? https://bitcointalk.org/index.php?topic=188383 Antwort: Ja, wenn man nicht höllisch aufpaßt.
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1011


View Profile
May 26, 2013, 10:13:39 PM
 #325

LvM, I suggest you reread this proposal and contemplate it's structure. It is much closer to what you are proposing than I think you understand it to be. I suggest you also look into why bitcoin is structured the way it is (addresses != accounts), as there are good reasons for that too. Then I suggest that you make another thread detailing your proposal, as we are wandering significantly off topic.

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
LvM
Full Member
***
Offline Offline

Activity: 126
Merit: 100


View Profile
May 27, 2013, 12:47:45 AM
 #326

LvM, I suggest you reread this proposal and contemplate it's structure. It is much closer to what you are proposing than I think you understand it to be. I suggest you also look into why bitcoin is structured the way it is (addresses != accounts), as there are good reasons for that too. Then I suggest that you make another thread detailing your proposal, as we are wandering significantly off topic.

@maaku

You did not even try to understand what I wrote?
The fact that BTC has no GAAP conform accounts is exactly the problem I mean and explained.

But do what you like or better:
do what you have todo after collecting thanks mainly @evoorhees >156 BTC for the first 3 months of your adventurous efforts.
Around 15.000,00 USD. Bravo!

https://bitcointalk.org/index.php?topic=204283.0;all
http://blockexplorer.com/address/13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP

BTC violates GAAP, result a MESS  https://bitcointalk.org/index.php?topic=211835.0
Anforderungen an eine PROFESSIONELLE BTC-Anwendung https://bitcointalk.org/index.php?topic=189669
BANKGEHEIMNIS mit BTC gleich NULL!? https://bitcointalk.org/index.php?topic=188383 Antwort: Ja, wenn man nicht höllisch aufpaßt.
LvM
Full Member
***
Offline Offline

Activity: 126
Merit: 100


View Profile
May 27, 2013, 12:52:15 AM
 #327

@maaku

Will be very interesting if the result of your efforts is more than just a workaround, making things even more confusing,
not really solving the underlying fundamental problems I gratis Cheesy explained solely from the financial point of view.

As normal in those superficially solved workaround and muddle through cases the same or equal problems will arise.

I made this experience again and again with my own code.
So this could become a lifetime job for you!
Congratulations! Cheesy

BTC violates GAAP, result a MESS  https://bitcointalk.org/index.php?topic=211835.0
Anforderungen an eine PROFESSIONELLE BTC-Anwendung https://bitcointalk.org/index.php?topic=189669
BANKGEHEIMNIS mit BTC gleich NULL!? https://bitcointalk.org/index.php?topic=188383 Antwort: Ja, wenn man nicht höllisch aufpaßt.
Pieter Wuille
Legendary
*
qt
Offline Offline

Activity: 1072
Merit: 1170


View Profile WWW
May 27, 2013, 12:58:59 AM
 #328

Bitcoin at its core is not an accounting system, it is a currency. It implements this through its own ledger-like mechanism, though one that doesn't quite exactly match what we expect from a ledger. And I believe that doesn't matter.

Accounting is something you do at the "payment level", not at the "currency level". Dollars have no built-in way to track historical balances, and I don't think anyone considers that a shortcoming. We can keep track of credits and debits regardless.

I'm sure that with all developments at the wallet software (multiple clients, payment protocol, ...), it's perfectly possible to create a GAAP-conformant system, or at least one that easily integrates with accounting systems.

I don't think discussion about that belongs in this thread though - this is about solving a low-level scalability problem, not about how we manage high-level use of it.

I do Bitcoin stuff.
Sukrim
Legendary
*
Offline Offline

Activity: 2618
Merit: 1006


View Profile
May 27, 2013, 07:57:56 AM
 #329

Also the "accounting solution" only helps as long as there are transactions like "1..n inputs pay to 1..n outputs". Bitcoin can have vastly more complex standard transactions and even more complex nonstandard transactions.

If you want to have a good accounting system, please start by improving ABE (the alternative block explorer) to actually even handle all currently existing transactions on the block chain and assign an account to each of them. Good luck with that.

https://www.coinlend.org <-- automated lending at various exchanges.
https://www.bitfinex.com <-- Trade BTC for other currencies and vice versa.
aaaxn
Sr. Member
****
Offline Offline

Activity: 359
Merit: 250



View Profile
May 27, 2013, 09:51:43 AM
 #330

Also the "accounting solution" only helps as long as there are transactions like "1..n inputs pay to 1..n outputs". Bitcoin can have vastly more complex standard transactions and even more complex nonstandard transactions.

If you want to have a good accounting system, please start by improving ABE (the alternative block explorer) to actually even handle all currently existing transactions on the block chain and assign an account to each of them. Good luck with that.
Yes, bitcoin can have a lot of complex transactions which falls in 2 categories (not exclusively):
- things that are needed by 0.00001% of users
- things that must be blocked because it would bloat blockchain too much


                                                                              █
                              █████████                  ██████ 
                      ███████████████████████████   
              ███████████████████████████████   
            ████████████████████████████████   
        █████████████████████████████████     
    ████████████████████████████████████   
    ████████          █████████          █████████   
  ████████                ██████              ████████   
█████████                █████                ████████   
███████████                █                ███████████ 
██████████████                      ██████████████ 
█████████████████            ████████████████ 
███████████████                  ███████████████ 
█████████████                          █████████████ 
███████████              ███                ██████████ 
█████████                █████                ████████   
  ████████              ███████              ███████     
    █████████        █████████          ████████     
      █████████████████████████████████       
        ██████████████████████████████           
            ███████████████████████████             
              ████████████████████████                 
                  ████████████████████                     
CorionX


















Powered by,
LvM
Full Member
***
Offline Offline

Activity: 126
Merit: 100


View Profile
May 27, 2013, 10:26:34 PM
 #331

Bitcoin at its core is not an accounting system, it is a currency. It implements this through its own ledger-like mechanism, though one that doesn't quite exactly match what we expect from a ledger. And I believe that doesn't matter.

Accounting is something you do at the "payment level", not at the "currency level". Dollars have no built-in way to track historical balances, and I don't think anyone considers that a shortcoming. We can keep track of credits and debits regardless.

I'm sure that with all developments at the wallet software (multiple clients, payment protocol, ...), it's perfectly possible to create a GAAP-conformant system, or at least one that easily integrates with accounting systems.

I don't think discussion about that belongs in this thread though - this is about solving a low-level scalability problem, not about how we manage high-level use of it.

@Pieter Wuille

I would rather say, that "accounting systems" and "currencies" are completely different things.
They cannot be compared or treated/used somehow alternatively as you insinuate.

Accounting is for all currencies including BTC exactly the same!
For details see GAAP, not even mentioned once in the whole wiki.

https://en.bitcoin.it/w/index.php?search=GAAP&button=&title=Special%3ASearch

In BTC its really easy.
We have to deal only with a very tiny and very easy partition of GAAP,
i.e the logically really quite simple payment system only.

A ==>> B, B ==> C etc.

But basics forgotten, totally ignored at lowest levels can hardly be repaired on higher levels, here the level of users/clients.
Going that way the self inflicted basic problems are perpetuated and increased, see the endless exploding blockchain (which easily could have been avoided by carry-overs not even thought about, see GAAP).

So I can't help but hope and insist that all efforts are concentrated and invested in the BTC level itself and not wasted in workarounds installing a complicated parallel system for things that should and could much easier be done on the lower level.

Muck out all stuff and bestow BTC itself (not the clients) with a small and professionel, fast working payment system, simply and quickly usable by each client and THEIR accounting/bookkeeping systems !!

Brevity is the soul of wit. Cheesy
The more code and stuff the worse.

Small is beautiful !! Cheesy

BTC violates GAAP, result a MESS  https://bitcointalk.org/index.php?topic=211835.0
Anforderungen an eine PROFESSIONELLE BTC-Anwendung https://bitcointalk.org/index.php?topic=189669
BANKGEHEIMNIS mit BTC gleich NULL!? https://bitcointalk.org/index.php?topic=188383 Antwort: Ja, wenn man nicht höllisch aufpaßt.
Pieter Wuille
Legendary
*
qt
Offline Offline

Activity: 1072
Merit: 1170


View Profile WWW
May 27, 2013, 10:38:13 PM
 #332

I completely disagree.

Yes, Bitcoin is exactly the same as other currencies in this respect. But we don't track serial numbers of dollar bills we get, do we? So why would we do accounting based on the currency?

Bitcoin transactions are only low-level and potentially complex movement of coins. For accounting, you don't care about individual coins. A real-world payment may be settled by several Bitcoin transactions, or a single Bitcoin transaction may be used to fullfill several payments. But nobody needs to care about this.

You don't care how transactions between business identities are settled. You don't care which coins or dollar bills are used, only between which entities they move. Stop seeing Bitcoin as a ledger - even though many wallet clients show it as such - it is not a ledger of "payments", but of "coin movements".


I do Bitcoin stuff.
aaaxn
Sr. Member
****
Offline Offline

Activity: 359
Merit: 250



View Profile
May 27, 2013, 11:04:48 PM
Last edit: May 27, 2013, 11:20:39 PM by aaaxn
 #333

I completely disagree.

Yes, Bitcoin is exactly the same as other currencies in this respect. But we don't track serial numbers of dollar bills we get, do we? So why would we do accounting based on the currency?
Gold coin doesn't need serial number, does it?

Bitcoin transactions are only low-level and potentially complex movement of coins. For accounting, you don't care about individual coins. A real-world payment may be settled by several Bitcoin transactions, or a single Bitcoin transaction may be used to fullfill several payments. But nobody needs to care about this.

You don't care how transactions between business identities are settled. You don't care which coins or dollar bills are used, only between which entities they move. Stop seeing Bitcoin as a ledger - even though many wallet clients show it as such - it is not a ledger of "payments", but of "coin movements".
So why exactly simulate things that nobody cares about?


                                                                              █
                              █████████                  ██████ 
                      ███████████████████████████   
              ███████████████████████████████   
            ████████████████████████████████   
        █████████████████████████████████     
    ████████████████████████████████████   
    ████████          █████████          █████████   
  ████████                ██████              ████████   
█████████                █████                ████████   
███████████                █                ███████████ 
██████████████                      ██████████████ 
█████████████████            ████████████████ 
███████████████                  ███████████████ 
█████████████                          █████████████ 
███████████              ███                ██████████ 
█████████                █████                ████████   
  ████████              ███████              ███████     
    █████████        █████████          ████████     
      █████████████████████████████████       
        ██████████████████████████████           
            ███████████████████████████             
              ████████████████████████                 
                  ████████████████████                     
CorionX


















Powered by,
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1011


View Profile
May 28, 2013, 03:02:04 AM
 #334

Bitcoin transactions are only low-level and potentially complex movement of coins. For accounting, you don't care about individual coins. A real-world payment may be settled by several Bitcoin transactions, or a single Bitcoin transaction may be used to fullfill several payments. But nobody needs to care about this.

You don't care how transactions between business identities are settled. You don't care which coins or dollar bills are used, only between which entities they move. Stop seeing Bitcoin as a ledger - even though many wallet clients show it as such - it is not a ledger of "payments", but of "coin movements".
So why exactly simulate things that nobody cares about?

That may be a valid discussion to have. But it's best done elsewhere: we are now very off topic.

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
LvM
Full Member
***
Offline Offline

Activity: 126
Merit: 100


View Profile
May 28, 2013, 11:38:53 AM
 #335

@maaku: We are not "off topic" at all.
Proposed workarounds are the place to discuss the reasons for their (wrongly assumed) necessity.

BTC violates GAAP, result a MESS  https://bitcointalk.org/index.php?topic=211835.0
Anforderungen an eine PROFESSIONELLE BTC-Anwendung https://bitcointalk.org/index.php?topic=189669
BANKGEHEIMNIS mit BTC gleich NULL!? https://bitcointalk.org/index.php?topic=188383 Antwort: Ja, wenn man nicht höllisch aufpaßt.
LvM
Full Member
***
Offline Offline

Activity: 126
Merit: 100


View Profile
May 28, 2013, 11:43:37 AM
 #336

And yes,
I know that some BTC "Heroes" won't even hear evident, irrebuttable facts based on pure logic and thousand year old GAAP.

Its hard to concede we were sailing on the wrong ship all the time, isn't it.

BTC violates GAAP, result a MESS  https://bitcointalk.org/index.php?topic=211835.0
Anforderungen an eine PROFESSIONELLE BTC-Anwendung https://bitcointalk.org/index.php?topic=189669
BANKGEHEIMNIS mit BTC gleich NULL!? https://bitcointalk.org/index.php?topic=188383 Antwort: Ja, wenn man nicht höllisch aufpaßt.
aaaxn
Sr. Member
****
Offline Offline

Activity: 359
Merit: 250



View Profile
May 28, 2013, 11:48:13 AM
 #337

And yes,
I know that some BTC "Heroes" won't even hear evident, irrebuttable facts based on pure logic and thousand year old GAAP.

Its hard to concede we were sailing on the wrong ship all the time, isn't it.

I think it is pointless trying to convince fellow bitcoiners that their baby is poorly designed. Maybe you should support one of alt coins which plans to do exactly as you say?

https://bitcointalk.org/index.php?topic=195275.0
or
https://bitcointalk.org/index.php?topic=169204.0


                                                                              █
                              █████████                  ██████ 
                      ███████████████████████████   
              ███████████████████████████████   
            ████████████████████████████████   
        █████████████████████████████████     
    ████████████████████████████████████   
    ████████          █████████          █████████   
  ████████                ██████              ████████   
█████████                █████                ████████   
███████████                █                ███████████ 
██████████████                      ██████████████ 
█████████████████            ████████████████ 
███████████████                  ███████████████ 
█████████████                          █████████████ 
███████████              ███                ██████████ 
█████████                █████                ████████   
  ████████              ███████              ███████     
    █████████        █████████          ████████     
      █████████████████████████████████       
        ██████████████████████████████           
            ███████████████████████████             
              ████████████████████████                 
                  ████████████████████                     
CorionX


















Powered by,
LvM
Full Member
***
Offline Offline

Activity: 126
Merit: 100


View Profile
May 28, 2013, 12:09:40 PM
 #338

And yes,
I know that some BTC "Heroes" won't even hear evident, irrebuttable facts based on pure logic and thousand year old GAAP.

Its hard to concede we were sailing on the wrong ship all the time, isn't it.

I think it is pointless trying to convince fellow bitcoiners that their baby is poorly designed. Maybe you should support one of alt coins which plans to do exactly as you say?

https://bitcointalk.org/index.php?topic=195275.0
or
https://bitcointalk.org/index.php?topic=169204.0

@aaaxn
Thank you! Will check it when my time allows.

BTC violates GAAP, result a MESS  https://bitcointalk.org/index.php?topic=211835.0
Anforderungen an eine PROFESSIONELLE BTC-Anwendung https://bitcointalk.org/index.php?topic=189669
BANKGEHEIMNIS mit BTC gleich NULL!? https://bitcointalk.org/index.php?topic=188383 Antwort: Ja, wenn man nicht höllisch aufpaßt.
jtimon
Legendary
*
Offline Offline

Activity: 1372
Merit: 1002


View Profile WWW
May 30, 2013, 09:54:06 PM
 #339

While still talking about the ledger vs UTXO, I'll try to stay close to the context of this proposal, maybe the ones that prefer a ledger can be statisfied with little changes.
Let's try.
Let's look, for example, at this transaction: http://cryptocoinexplorer.com:4750/tx/8b1afc9aa2ca96c846dce3a47d577068e5f722961eff036b5792756eef28e2a0
The full public key of 18dTnNqj396jL9U98RHYEyJX2TSw6Ku7Gd and the signature is repeated in this transaction, isn't it?
We can modify the UTXO dictionary to avoid this redundancy.

The UTXO dictionary stores (scriptPubKey, balance)

But if he stored a ledger (address/scriptPubKey, balance), you would only need to sign once for 18dTnNqj396jL9U98RHYEyJX2TSw6Ku7Gd in the tx above.
Note that I'm considering an UTXO enforced by the chain and not an altchain.
The "default script" of just showing the public key

Code:
scriptPubKey: OP_DUP OP_HASH160 <pubKeyHash> OP_EQUALVERIFY OP_CHECKSIG
scriptSig: <sig> <pubKey>

Could be assumed and ommitted, replaced by this:

Code:
scriptPubKey: <pubKeyHash>
scriptSig: <sig> <pubKey>

But then you would need to indicate explicitly how much you want to spend from the address in the input.
The transaction above would not have had to reference each input individually, it would only have included the address, an amount to substract, the public key and signature. Well, and the sequence number or equivalent for that address. I prefer a hash of the last transaction spending from that address. Otherwise you need to remember empty addresses ast sequence numbers forever.

Basically, this (paying to an address substracting from another address) would be the general case and the "pay to script" or contract, would be the special case. I think Ripple must do something like this for their ledger and their contract scripts.

Does this make technical sense before we start discussing whether a currency is in itself an accounting system and other philosophical matters?

2 different forms of free-money: Freicoin (free of basic interest because it's perishable), Mutual credit (no interest because it's abundant)
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
May 30, 2013, 10:07:08 PM
 #340

While still talking about the ledger vs UTXO, I'll try to stay close to the context of this proposal, maybe the ones that prefer a ledger can be statisfied with little changes.
Let's try.
Let's look, for example, at this transaction: http://cryptocoinexplorer.com:4750/tx/8b1afc9aa2ca96c846dce3a47d577068e5f722961eff036b5792756eef28e2a0
The full public key of 18dTnNqj396jL9U98RHYEyJX2TSw6Ku7Gd and the signature is repeated in this transaction, isn't it?
We can modify the UTXO dictionary to avoid this redundancy.

The UTXO dictionary stores (scriptPubKey, balance)

But if he stored a ledger (address/scriptPubKey, balance), you would only need to sign once for 18dTnNqj396jL9U98RHYEyJX2TSw6Ku7Gd in the tx above.
Note that I'm considering an UTXO enforced by the chain and not an altchain.
The "default script" of just showing the public key

Code:
scriptPubKey: OP_DUP OP_HASH160 <pubKeyHash> OP_EQUALVERIFY OP_CHECKSIG
scriptSig: <sig> <pubKey>

Could be assumed and ommitted, replaced by this:

Code:
scriptPubKey: <pubKeyHash>
scriptSig: <sig> <pubKey>

But then you would need to indicate explicitly how much you want to spend from the address in the input.
The transaction above would not have had to reference each input individually, it would only have included the address, an amount to substract, the public key and signature. Well, and the sequence number or equivalent for that address. I prefer a hash of the last transaction spending from that address. Otherwise you need to remember empty addresses ast sequence numbers forever.

Basically, this (paying to an address substracting from another address) would be the general case and the "pay to script" or contract, would be the special case. I think Ripple must do something like this for their ledger and their contract scripts.

Does this make technical sense before we start discussing whether a currency is in itself an accounting system and other philosophical matters?

I was a little conflicted on this topic.  I see the "elegance" of just using raw scripts.  But I didn't like the idea that multiple forms of the same "effective" address required separate lookups.  For instance, the same address can be used in the "DUP HASH160 <Addr160> EQUAL CHECKSIG" as well as "<PubKey> CHECKSIG".  And maybe "<message> DROP <pubKey> CHECKSIG".  Or "<message> DROP DUP HASH <Addr160> EQUAL CHECKSIG".  In fact, I don't even have a good way to go search for those message scripts unless I know the message in advance.  It would require a good old-fashioned full-UTXO-set-search, which was something we were hoping to avoid with this whole discussion.

But it's also not feasible to capture all script types that are single-sig spendable.  Adding new "standard" script type representations to the UTXO tree later will require a hard-fork.  Perhaps this is a reason not to have things like "<message> OP_DROP", or it's a reason to be more intelligent about how we represent scripts in this tree.  This is why I was conflicted...

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1011


View Profile
May 31, 2013, 12:01:53 AM
Last edit: May 31, 2013, 12:16:42 AM by maaku
 #341

Well it is straightforward to demonstrate to a lightweight client that such other script hashes relate to the same address. A full node could respond to a UTXO query with a the asked-for data and an attached message along the lines of "and hey btw did you know about these scripts using the same pubkey: ...?" attached at the end. The client can verify for itself that indeed, those scripts use the same pubkey or P2SH hash and then query for those outputs as well.

Obviously this isn't ideal, but it might be good enough (you wouldn't need consensus for this data, just be attached to at least 1 honest full node maintaining an index of this information). Or I suppose yet another (3rd!) index can be maintained mapping addresses to scripts actually present in the UTXO set, but as you note that wouldn't be future-proof.

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
oakpacific
Hero Member
*****
Offline Offline

Activity: 784
Merit: 1000


View Profile
June 02, 2013, 12:58:31 PM
 #342

Is it possible for a miner to only download full blocks from the last checkpoint and still validate transactions?

https://tlsnotary.org/ Fraud proofing decentralized fiat-Bitcoin trading.
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1011


View Profile
June 02, 2013, 04:11:13 PM
 #343

Possible? Yes. Desirable? No. It's important that miners verify that they haven't been duped onto a side chain. It is, however, okay for them to throw away those historical transactions once they have been verified and just keep the UTXO set.

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
June 02, 2013, 10:41:13 PM
 #344

Well it is straightforward to demonstrate to a lightweight client that such other script hashes relate to the same address. A full node could respond to a UTXO query with a the asked-for data and an attached message along the lines of "and hey btw did you know about these scripts using the same pubkey: ...?" attached at the end. The client can verify for itself that indeed, those scripts use the same pubkey or P2SH hash and then query for those outputs as well.

Obviously this isn't ideal, but it might be good enough (you wouldn't need consensus for this data, just be attached to at least 1 honest full node maintaining an index of this information). Or I suppose yet another (3rd!) index can be maintained mapping addresses to scripts actually present in the UTXO set, but as you note that wouldn't be future-proof.

There's two problems with this logic:
(1) Every request is X-fold more work for the serving peer just to catch the 0.01% of addresses that have multiple forms in the blockchain.  It has to do multiple lookups for each request.
(2) Something like "<message> OP_DROP ..." is a serious problem for this proposal.  The task of "find all UTXOs to address/pubkey X, which have a message prefixed to it" requires a full search of the UTXO space.   Such scripts lose all benefit of this proposal.  In fact, any node using these kinds of scripts will have to use the original full-node-but-pruned logic, unless the extraneous data is totally deterministic/predictable.

Number 2 is concerning, because even if nodes somehow know all the messages they are expecting to see, the proofs of existence (or non-existence) are on isolated branches and require quite a bit more data to prove than if they were all clustered on one branch.   And if they don't know what the messages (or other data unrelated to addr/pubkey), then the peer might be expected to do the UTXO search.  They might end up adding extra metadata to their database just to accommodate these kinds of requests. 

On the other hand, since address re-use is a bad idea, maybe the argument about isolated branches are irrelevant. 

I have an idea that is beyond infeasible, but it's good food for thought.  Maybe it's useless, but what the hell:

For a given script, if it contains a single public key or single hash160 value (how do we define this?), then we use the hash160 value or compute it (from the public key) and prefix that to the script, replacing the value actually in script with a 0xff byte (or something unused).  This is okay technically (if it were possible to identify what values will be used as pubkeys/hash160s), because it's technically not a requirement for the lookup key to match the script.  The real script will still be stored at the leaf node.   So "DUP HASH160 <addr160> EQUALVERIFY CHECKSIG" would be keyed:  "<addr160> DUP HASH160 0xFF EQUALVERIFY CHECKSIG". 

This theoretically solves all the problems at once, because it doesn't really matter what else is in the script, as long as the "controlling address" is first.  Then it's trivial to prove inclusion or exclusion of data for a given address.    But of course, it's probably intractible to figure out how to reduce arbitrary scripts to "controlling addresses." 





Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1011


View Profile
June 03, 2013, 12:13:51 AM
 #345

I think one of us is misunderstanding the other. Bitcoin already has code for extracting either pubkeys, hash160(pukey)'s, or hash160(script)'s from arbitrary transactions as they come in. That's how the wallet code knows whether a transaction is yours or not.

What I'm suggesting is that some full nodes remember and index these results for every transaction in the UTXO set, creating a map of hash160 -> script variants. They then expose the ability for other nodes to query this map, or proactively do so if/when they receive a UTXO query.

This hash160 -> script(s) map doesn't need to be deterministic or authenticated in any way. If given a script, anyone can examine it to see that yes, it does indeed have the relevant pubkey, hash160(pubkey), or p2sh embedded within it, and query the authenticated UTXO index to see that yes, there are unspent outputs using this script. Therefore we don't need to solve the possibly intractable, definitely not future-proof problem of figuring out a general way to match arbitrary scripts to “controlling addresses.” We can use bitcoind's current logic, and are free to update that logic on each release.

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
June 03, 2013, 12:25:22 AM
 #346

I think one of us is misunderstanding the other. Bitcoin already has code for extracting either pubkeys, hash160(pukey)'s, or hash160(script)'s from arbitrary transactions as they come in. That's how the wallet code knows whether a transaction is yours or not.

What I'm suggesting is that some full nodes remember and index these results for every transaction in the UTXO set, creating a map of hash160 -> script variants. They then expose the ability for other nodes to query this map, or proactively do so if/when they receive a UTXO query, and provide a p2p message for gossiping these mappings.

This hash160 -> script(s) map doesn't need to be deterministic or authenticated in any way. If given a script, anyone can examine it to see that yes, it does indeed have the relevant pubkey, hash160(pubkey), or p2sh embedded within it, and query the authenticated UTXO index to see that yes, there are unspent outputs using this script. Therefore we don't need to solve the possibly intractable, definitely not future-proof problem of figuring out a general way to match arbitrary scripts to “controlling addresses.” We can use bitcoind's current logic, and are free to update that logic on each release.

Okay, I was proposing a change to the way the UTXO tree was keyed, for the purposes of having a consistent way to look up balances/UTXOs for hash160 values instead of raw scripts.  You are proposing we keep it keyed/indexed by raw script, but the nodes can store their own meta data for those addresses as they see recognizable scripts.   It would not be a requirement to serve the entirety of all UTXOs spendable by someone with the private key of a given hash160, but it can still be useful. 

Is that a correct interpretation of what you're saying?

My problem with that is that it isn't determinisitc whether you will be able to find all your coins.  Maybe it doesn't matter:  "Use standard scripts if you want to be able to find all your coins efficiently.  Otherwise, you're on your own."  With the way you suggest it:  if you get lucky, nodes have the data pre-indexed for you, and have all of it.  But you can't count on it and they can't prove whether they supplied you everything.  This makes it considerably less useful, and possibly not useful (nodes need to be able to know if they have everything, or they'll do something else that guarantees it).

I think ultimately the raw script indexing is the correct answer.  I'm just exploring alternatives, that make the multi-script-single-address problem a little more efficient (and reliable).


Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
oakpacific
Hero Member
*****
Offline Offline

Activity: 784
Merit: 1000


View Profile
June 03, 2013, 12:36:19 AM
 #347

Possible? Yes. Desirable? No. It's important that miners verify that they haven't been duped onto a side chain. It is, however, okay for them to throw away those historical transactions once they have been verified and just keep the UTXO set.

Yeah, I did not mention the UTXO set because I thought it's obivous.

The reason I brought up this is, I believe a lot of us are willing to run a USB miner to secure the network, without generating any noticeable revenue, now that it's out and very power-efficient, the power cost of keeping one running is somehow negligible, but if we have to download and store the rapidly growing full chain, the cost may grow significantly.

https://tlsnotary.org/ Fraud proofing decentralized fiat-Bitcoin trading.
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1011


View Profile
June 03, 2013, 01:21:21 AM
 #348

I propose that we keep the UTXO index keyed by hash160(scriptPubKey) - for different reasons, primarily for the benefits of a constant key size. That would make things more difficult for your proposal. But yes, other than that we are interpreting each other correctly now.

Quote from: etotheipi
My problem with that is that it isn't determinisitc whether you will be able to find all your coins.  Maybe it doesn't matter:  "Use standard scripts if you want to be able to find all your coins efficiently.  Otherwise, you're on your own."  With the way you suggest it:  if you get lucky, nodes have the data pre-indexed for you, and have all of it.  But you can't count on it and they can't prove whether they supplied you everything.  This makes it considerably less useful, and possibly not useful (nodes need to be able to know if they have everything, or they'll do something else that guarantees it).

I pretty much agree with you, except the last point. It wouldn't be something you'd want to rely upon, but it would be better than nothing, and perhaps there would be valuable use cases when connected to a trusted node. My hesitation is with specifying a general algorithm for identifying addresses/hashes to prefix scripts with. One that is deterministic and future-proof, so that we can use it in an authenticated data structure.

Possible? Yes. Desirable? No. It's important that miners verify that they haven't been duped onto a side chain. It is, however, okay for them to throw away those historical transactions once they have been verified and just keep the UTXO set.

Yeah, I did not mention the UTXO set because I thought it's obivous.

The reason I brought up this is, I believe a lot of us are willing to run a USB miner to secure the network, without generating any noticeable revenue, now that it's out and very power-efficient, the power cost of keeping one running is somehow negligible, but if we have to download and store the rapidly growing full chain, the cost may grow significantly.

The miner could validate the entire history or synchronize with constant storage requirements, throwing away data as it is no longer required.

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
June 03, 2013, 01:33:56 AM
 #349

I propose that we keep the UTXO index keyed by hash160(scriptPubKey) - for different reasons, primarily for the benefits of a constant key size. That would make things more difficult for your proposal. But yes, other than that we are interpreting each other correctly now.

If we're going to use "essentially" raw script, we'll save quite a bit of space by making it the key of the tree, then we don't have to store the actual script at the leaf nodes.  I am blanking on all the other data that needs to be stored, but it might be considerable savings (I haven't thought about this in a while).   Well, it would save exactly 20 bytes per leaf.  Right now I think there are 6 million UTXOs, so that would be 120 MB of savings.   

The constant keysize would be important in a trie, but in a level-compressed PATRICIA-like tree, it shouldnt' make a difference. 

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1011


View Profile
June 03, 2013, 02:09:37 AM
 #350

Well, that's a reasoned argument I can listen to. Might I suggest then, that the key format be the following

Code:
<20 bytes> idx:var_int <...n-bytes...>

Where idx is the index in the range of [0, n] specifying where the prefixed 20 bytes should be spliced into the script. This step is skipped for scripts/keys of 20 bytes or less.

The only issue then is an unambiguous algorithm for selecting which 20-bytes to pull out for the prefix.

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
June 03, 2013, 02:13:32 AM
 #351

Well, that's a reasoned argument I can listen to. Might I suggest then, that the key format be the following

Code:
<20 bytes> idx:var_int <...n-bytes...>

Where idx is the index in the range of [0, n] specifying where the prefixed 20 bytes should be spliced into the script. This step is skipped for scripts/keys of 20 bytes or less.

The only issue then is an unambiguous algorithm for selecting which 20-bytes to pull out for the prefix.

Hah, you just jumped all the way to the other side.  I was actually just suggesting that if we're not going to do anything fancy with the scripts, we should use the exact script as the key instead of its hash.  That way we don't have to store the script itself in the key and value of the leaf node (even though actually only the hash160(rawscript) would be in the key).

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1011


View Profile
June 03, 2013, 04:21:59 AM
 #352

Oh, my conservative position is still that prefixing the index key is the wrong way to solve this problem, but I'm willing to explore the idea.

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
ThomasV
Legendary
*
Offline Offline

Activity: 1896
Merit: 1343



View Profile WWW
June 03, 2013, 12:30:03 PM
 #353

I believe this proposal is of primary importance for Electrum.
I started to work on it a few weeks ago, in order to add it to Electrum servers. I finalized two preliminary implementations during this week-end.

I am pretty agnostic concerning the choice of keys; I guess hash160(rawscript) makes sense.

I would like to make the following point:
It is possible to compute node hashes much faster if you store the hash of a node at the key of its parent.
That way, it is not necessary to perform database requests to all children when only one child is updated.
In order to do that, it is necessary to keep a list of children pointers at each node; this list uses a bit more space (20 bytes/node).
Thus, each node stores a list of pointers (20 bytes), and a variable-length list of hash:sumvalue for its children

I made two separate implementations:
- a "plain vanilla" version without pointers, where a node's hash is stored at the node; this implementation was too slow to be practical.
- a faster version that stores node hashes at their parent, and keeps a list of pointers for each node.

both versions are available in on github, in the Electrum server code: https://github.com/spesmilo/electrum-server
(look for branches "hashtree" and "hashtree2")

both branches were tested with randomly generated blockchain reorgs, and they produced the same root hash.

I could run the "hashtree2" version for 184k blocks on my vps, and another user went over 200k using a faster machine, but it still took him more than 24h.
I am currently working on a third version, that will use write batches when computing the hashes; I hope to further accelerate it that way.


Electrum: the convenience of a web wallet, without the risks
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1011


View Profile
June 03, 2013, 03:57:20 PM
 #354

Am I reading the source code correctly that you are doing a standard Merkle-list for the UTXO tree? I couldn't find anything that looked like balanced tree updates. I'd think that's the root of your inefficiency right there - PATRICIA trees are a big part of this proposal.

You are right that this impacts Electrum significantly. We should coordinate our efforts.

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
ThomasV
Legendary
*
Offline Offline

Activity: 1896
Merit: 1343



View Profile WWW
June 03, 2013, 04:15:32 PM
 #355

Am I reading the source code correctly that you are doing a standard Merkle-list for the UTXO tree? I couldn't find anything that looked like balanced tree updates. I'd think that's the root of your inefficiency right there - PATRICIA trees are a big part of this proposal.

I use a PATRICIA tree for addresses, and a simple list for UTXOs that belong to the same address.
I remember discussing this question on IRC, we were not sure if it was better to store UTXOs as database entries or to use addresses for the leaves of the tree (what I do)

note that if we use a patricia tree of UTXOs, we might end up doing more database queries for the hashes; what makes you think it would be less efficient?



Electrum: the convenience of a web wallet, without the risks
ShadowOfHarbringer
Legendary
*
Offline Offline

Activity: 1470
Merit: 1005


Bringing Legendary Har® to you since 1952


View Profile
June 05, 2013, 10:13:26 AM
 #356

* ShadowOfHarbringer is watching this.

jtimon
Legendary
*
Offline Offline

Activity: 1372
Merit: 1002


View Profile WWW
June 06, 2013, 11:48:38 AM
 #357

I've been thinking that if the indexes are put directly in each main-chain block AND miners include a "signature" demonstrating computational integrity of all transaction validations in the chain, new full nodes only need to download the last block and are completely safe!!!

I wonder if block N signature's can be combined somehow with N+1 block transactions...
Does anybody know what's Ben's nick in the forum?

2 different forms of free-money: Freicoin (free of basic interest because it's perishable), Mutual credit (no interest because it's abundant)
d'aniel
Sr. Member
****
Offline Offline

Activity: 461
Merit: 251


View Profile
June 09, 2013, 11:22:03 PM
 #358

I've been thinking that if the indexes are put directly in each main-chain block AND miners include a "signature" demonstrating computational integrity of all transaction validations in the chain, new full nodes only need to download the last block and are completely safe!!!

I wonder if block N signature's can be combined somehow with N+1 block transactions...
Does anybody know what's Ben's nick in the forum?

After watching that video I can't help but think, with my very limited understanding of it, that SCIP combined with appropriate Bitcoin protocol changes (perhaps like, as you mentioned, localizing the full state in the blockchain using an authenticated UTXO tree) will be able to remove most of the reproduction of work necessary by the network that it currently must do in order to operate trust free, as well as make it possible to shard across untrusted peers the operation of combining new transactions into Merkle trees to produce new block headers for miners to work on.  These would mean the network could remain perfectly decentralized at ridiculously high transaction rates (the work done per node would, I think in theory, scale as O(M log(M) / N), where M is the transaction rate, and N is the total number of network nodes).  This might even mean an always-on zerocoin is feasible (always-on is important so that the anonymity set is maximal, and its users aren't a persecutable (relative) minority).

Anybody with a better understanding of SCIP and its applicability to Bitcoin able to pour cold water on these thoughts for me?
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
June 09, 2013, 11:51:53 PM
Last edit: June 10, 2013, 02:13:25 AM by etotheipi
 #359

I don't have any real understanding of SCIP, but I did talk to the guys behind it, at the conference.  They are very excited about their research, and it clearly is quite powerful if it works.  However, they did say that it is extremely complicated,  and even if it does work, it may have a tough time getting confidence from any security-conscious community due to its complexity.   I imagine it will need years of fielding in order for it to actually become an option for any important application.

And of course, I have my doubts that it really works.  It sounds too good to be true, but admittedly, I haven't had time to try to understand it at the technical level yet.   One day I'll try to find some time to dig into it.  But until then, we certainly can't count on it being available for the Reiner-tree.

I'd be extremely interested to see someone with the correct background dig into it and provide a technical overview of how it works.

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
Peter Todd
Legendary
*
expert
Offline Offline

Activity: 1120
Merit: 1147


View Profile
June 10, 2013, 12:19:24 AM
 #360

After watching that video I can't help but think, with my very limited understanding of it, that SCIP combined with appropriate Bitcoin protocol changes (perhaps like, as you mentioned, localizing the full state in the blockchain using an authenticated UTXO tree) will be able to remove most of the reproduction of work necessary by the network that it currently must do in order to operate trust free, as well as make it possible to shard across untrusted peers the operation of combining new transactions into Merkle trees to produce new block headers for miners to work on.  These would mean the network could remain perfectly decentralized at ridiculously high transaction rates (the work done per node would, I think in theory, scale as O(M log(M) / N), where M is the transaction rate, and N is the total number of network nodes).  This might even mean an always-on zerocoin is feasible (always-on is important so that the anonymity set is maximal, and its users aren't a persecutable (relative) minority).

Anybody with a better understanding of SCIP and its applicability to Bitcoin able to pour cold water on these thoughts for me?

You're actually quite correct. It solves the censorship problem too because the "upper levels" of this merkle tree of transactions are still cheap to validate so mining itself remains cheap. You do have issues where someone may create an imbalanced tree - the validation rules will need to have the levels of the merkle tree be sorted - but the work required to imbalance the tree increases exponentially. To be exact, it will be a patricia/radix tree rather than a merkle tree.

However SCIP is probably years away from getting to the point where we could use it in the Bitcoin core. One big issue is that a SCIP proof for a validated merkle tree has to be recursive so you need to create a SCIP proof that you ran a program that correctly validated a SCIP proof. Creating those recursive proofs is extremely expensive; gmaxwell can talk more but his rough estimates would be we'd have to hire a big fraction of Amazon EC2 and assemble a cluster of machines with hundreds of terrabytes of ram. But math gets better over time so there is hope.


A non-SCIP approach that we can do now would be to use fraud detection with punishment. Peers assemble some part of the merkle tree and digitally sign that they have done so honestly with an identity. (a communitive accumulator is another possibility) The tree is probabalisticly validated, and any detected fraud is punished somehow, perhaps by destroying a fidelity bond that the peer holds.  You still need some level of global consensus so the act of destroying a bond is meaningful of course, and there are a lot of tricky details to get right, but the rough idea is plausible with the cryptography available to us now.

d'aniel
Sr. Member
****
Offline Offline

Activity: 461
Merit: 251


View Profile
June 10, 2013, 01:16:47 AM
Last edit: June 10, 2013, 02:35:24 AM by d'aniel
 #361

However SCIP is probably years away from getting to the point where we could use it in the Bitcoin core. One big issue is that a SCIP proof for a validated merkle tree has to be recursive so you need to create a SCIP proof that you ran a program that correctly validated a SCIP proof. Creating those recursive proofs is extremely expensive; gmaxwell can talk more but his rough estimates would be we'd have to hire a big fraction of Amazon EC2 and assemble a cluster of machines with hundreds of terrabytes of ram. But math gets better over time so there is hope.
The talk left me with the impression that their non-recursive SCIP proofs are inexpensive, so I wonder if recursion could be avoided.  For example, if the full state were encoded locally in pairs of adjacent blocks  - as the proposal in this thread would achieve - then a SCIP proof validating the next block could simply assume validity of the two prior blocks, which is fine if the node verifying this proof has verified the SCIP proofs of all preceding blocks as well.  Once blocks become individually unwieldy, perhaps verifying each block would simply take a few extra SCIP proof validations - with SCIP proof authors tackling the transaction and UTXO patricia/radix tree updates by branches.  Could this approach properly remove the need to nest SCIP proofs inside of SCIP proofs, or is there something obvious I'm missing?

Edit: I suppose this would mean that Alice would be sending a slightly different program for Bob to run to produce each SCIP proof in each block?   I guess these programs would have to be a protocol standard, since 'Alice' is really everybody, and would differ only by the hash of the previous block?  All of this is very vague and magical to me still...
d'aniel
Sr. Member
****
Offline Offline

Activity: 461
Merit: 251


View Profile
June 10, 2013, 03:29:05 AM
 #362

A non-SCIP approach that we can do now would be to use fraud detection with punishment. Peers assemble some part of the merkle tree and digitally sign that they have done so honestly with an identity. (a communitive accumulator is another possibility) The tree is probabalisticly validated, and any detected fraud is punished somehow, perhaps by destroying a fidelity bond that the peer holds.  You still need some level of global consensus so the act of destroying a bond is meaningful of course, and there are a lot of tricky details to get right, but the rough idea is plausible with the cryptography available to us now.
I do like this approach as well, and hadn't thought to use fidelity bonds for expensive punishment of misbehaving anonymous 'miner helpers'.  Though it is susceptible to attacks on the p2p network, unlike a SCIP approach, by surrounding groups of nodes and blocking the relay of fraud proofs to them.  Not sure how important this is in practice though.
Peter Todd
Legendary
*
expert
Offline Offline

Activity: 1120
Merit: 1147


View Profile
June 11, 2013, 11:43:05 PM
 #363

The talk left me with the impression that their non-recursive SCIP proofs are inexpensive, so I wonder if recursion could be avoided.  For example, if the full state were encoded locally in pairs of adjacent blocks  - as the proposal in this thread would achieve - then a SCIP proof validating the next block could simply assume validity of the two prior blocks, which is fine if the node verifying this proof has verified the SCIP proofs of all preceding blocks as well.  Once blocks become individually unwieldy, perhaps verifying each block would simply take a few extra SCIP proof validations - with SCIP proof authors tackling the transaction and UTXO patricia/radix tree updates by branches.  Could this approach properly remove the need to nest SCIP proofs inside of SCIP proofs, or is there something obvious I'm missing?

If you are talking about pairs of adjacent blocks all you've achieved is making validating the chain possibly a bit cheaper, those creating the blocks still need to have the full UTXO set.


Going back to the merkle tree thing it occurs to me that achieving synchronization is really difficult. For instance if the lowest level of the tree is indexed by tx hash, you've achieved nothing because there is no local UTXO set consensus.

If the lowest level of the tree is indexed by txout hash, H(txid:vout), you now have the problem that you basically really have a set of merge-mined alt-coins. Suppose I have a txout whose hash starts with A and I want to spend it in a transaction that would result in a txout with a hash starting with B.

So I create a transaction spending that txout in chain A, destroying the coin in that chain, and use the merkle path to "prove" to chain B that the transaction happened and chain B can create a coin out of thin air. (note how the transaction will have to contain a nothing-up-my-sleeve nonce, likely a blockhash from chain B, to ensure you can't re-use the txout)

This is all well and good, but a 51% attack on just chain A, which overall might be a 5% attack, is enough to create coins out of thin air because chain B isn't actually able to validate anything other than there was a valid merkle path leading back to the chain A blockheader. It's not a problem with recursive SCIP because there is proof the rules were followed, but without you're screwed - at best you can probabilistically try to audit things, which just means an attacker gets lucky periodically. You can try to reverse the transaction after the fact, but that has serious issues too - how far back do you go?

Achieving consensus without actually having a consensus isn't easy...

I do like this approach as well, and hadn't thought to use fidelity bonds for expensive punishment of misbehaving anonymous 'miner helpers'.  Though it is susceptible to attacks on the p2p network, unlike a SCIP approach, by surrounding groups of nodes and blocking the relay of fraud proofs to them.  Not sure how important this is in practice though.

Bitcoin in general assumes a jam-proof P2P network is available.

An important issue is that determining how to value the fidelity bonds would be difficult; at any time the value of the bond must be more than the return on committing fraud. That's easy to do in the case of a bank with deposits denominated in BTC, much harder to reason about when you're talking about keeping an accurate ledger.

Peter Todd
Legendary
*
expert
Offline Offline

Activity: 1120
Merit: 1147


View Profile
June 12, 2013, 12:21:07 AM
 #364

Here's a rough sketch of another concept:

Suppose you have 2*k blockchains where each blockheader was actually the header of two blocks, that is chain n mod 2*k and chain (n+1) mod 2*k In English picture a ring of blockchains and miners would "mine" pairs of chains.

The rule is that the difference in height between any adjacent pair of chains can differ no more than 1 block, and finding a valid PoW creates a pair of blocks with an equal reward in each chain. Because the miners get the equal reward they have an incentive to honestly mine both chains, or they'd produce an invalid block and lose that reward. To move coins between one chain and it's neighbor create a special transaction doing so which will be validated fully because a miner will have full UTXO set knowledge for both chains. Of course, this means it might take k steps to actually get a coin moved from one side of the ring to the other, but the movement will be fully validated the whole way around.

Again, what used to be a 51% attack can now become something much weaker. On the other hand because the data to store the PoW's and block headers (but not full blocks) is small PoW's for one pair of chains can include the hashes of every chain, and the system can simply treat that extra PoW as an additional hurdle for an attacker to rewrite any individual chain. What a 51% attack on a pair of chains involves is to actually manage to get into a situation where you are the only person bothering to mine a particular pair of chains - hopefully a much higher barrier if people pick the pair of chains they validate randomly.

The ring is just a nice example; in reality I think it'd good enough to just have the n chains and miners pick pairs of chains to mine. The number of pairs that needs to be mined for a full interconnected set is n(n-1) ~= n^2. The big advantage of a fully connected set is that the slicing can happen on a per-txout-hash basis, IE a transaction spending a txout starting with A and creating a txout starting with B can be mined by anyone mining both the A and B chains, though note how you'll wind up paying fees for both, and with more outputs you can wind up with a partially confirmed transaction. Also note how a miner with only the UTXO set for the A chain can safely mine that transaction by simply creating a 1 transaction block in the B chain... ugly. You probably need proof-of-UTXO-set-posession on top of proof-of-work to keep the incentives correct.

We've created weird incentives for hashers because moment to moment the reward (fees) for mining each pair will be determined by the transactions required to bridge that pair, so pools will pop up like crazy and your mining software will pool hop automatically - another perverse result in a system designed to aid decentralization, although probably a manageable one with probabilistic auditing.

Maybe the idea works, but I'll have to think very carefully about it... there's probably a whole set of attacks and perverse incentives lurking in the shadows...

maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1011


View Profile
June 12, 2013, 12:22:14 AM
 #365

If you are talking about pairs of adjacent blocks all you've achieved is making validating the chain possibly a bit cheaper,

*A lot* cheaper. But anyway:

those creating the blocks still need to have the full UTXO set.
To create a transaction you only need access to your own inputs. Why would you need the full UTXO set?

Going back to the merkle tree thing it occurs to me that achieving synchronization is really difficult. For instance if the lowest level of the tree is indexed by tx hash, you've achieved nothing because there is no local UTXO set consensus.
Can you explain this?

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
June 12, 2013, 03:10:05 AM
 #366

For reference on the synchronization question, I will reference one of my previous posts.  It was a thought-experiment to figure out how to download the Reiner-tree between nodes, given that the download will take a while and you'll get branch snapshots at different block heights:

https://bitcointalk.org/index.php?topic=88208.msg1408410#msg1408410

I just wanted to make sure it wasn't something to be concerned about (like all sorts of hidden complexity).  It looks like it's workable.

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
d'aniel
Sr. Member
****
Offline Offline

Activity: 461
Merit: 251


View Profile
June 16, 2013, 01:20:13 AM
Last edit: June 16, 2013, 05:24:37 AM by d'aniel
 #367

Regarding having nested subtries for coins with the same ScriptSigs, I wonder if it's such a good idea to complicate the design like this in order to accommodate address reuse?  Address reuse is discouraged for privacy and security reasons, and will become increasingly unnecessary with the payment protocol and deterministic wallets.

Also, was there a verdict on the 2-way (bitwise) trie vs. 256-way + Merkle trees in each node?  I've been thinking lately about sharding block creation/verification, and am noticing the advantages of the bitwise trie since its updates require a much more localized/smaller set of data.
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1011


View Profile
June 16, 2013, 02:57:19 AM
 #368

I've pushed an in-memory hybrid PATRICIA-Braindais tree implementation to github:

https://github.com/maaku/utxo-index

I may experiment with the internal structure of this tree (for example: different radix sizes, script vs hash(script) as key, storing extra information per node). 2-way tries probably involve way too much overhead, but I think a convincing argument could be made for 16-way tries (two levels per byte). Once I get a benchmark runner written we can get some empirical evidence on this.

Having sub-trees isn't so much about address reuse as it is that two different keys are needed: the key is properly (script, txid:n). In terms of implementation difficulty I don't think it's actually that much more complicated. But again, we can empirically determine this.

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
d'aniel
Sr. Member
****
Offline Offline

Activity: 461
Merit: 251


View Profile
June 16, 2013, 04:57:49 PM
 #369

I've pushed an in-memory hybrid PATRICIA-Braindais tree implementation to github:

https://github.com/maaku/utxo-index
Cool!

On second thought, I don't think the radix size really matter too much for sharding the node.  The choice of keying OTOH...
d'aniel
Sr. Member
****
Offline Offline

Activity: 461
Merit: 251


View Profile
June 16, 2013, 04:58:51 PM
Last edit: June 17, 2013, 04:21:42 AM by d'aniel
 #370

@retep, here's roughly how I imagine sharding a node could be done without diluting hashing power across multiple separate chains (that sounds terrible!):

First I'll assume we include in the block headers the digest of a utxo tree keyed by (txid:n, script) instead of by (script, txid:n), as this will turn out to be much more natural, for this purpose at least.  Second, I'll assume the tx digest is created from the authenticated prefix tree of their txids, which will also turn out to be much more natural.  (Last minute thought: doesn't the tx ordering matter in the usual tx Merkle tree, i.e. earlier txs can't spent TxOuts created by later txs?  Or can it just be assumed that the block is valid if there exists some valid ordering which is up to the verifier to construct?)  The radix size turns out not to matter, but let's call it k.

Distributed block construction

Division of labor is as follows: We have a coordinator who directs the efforts of N well-mirrored branch curators who separately update each of the utxo tree branches below level logk(N), and process subblocks of any number of transactions.

A branch curator downloads the incoming txs whose txids lie in his particular branch.  Notice that due to our convenient choice of keying, all of his newly created TxOuts will lie in his own branch.  For each TxIn in a given tx, he needs to download the corresponding TxOut from his relevant well-mirrored counterparts. Note that TxOuts will always be uniquely identifiable with only a few bytes, even for extremely large utxo sets. Also, having to download the TxOuts for the corresponding TxIns isn't typically that much extra data, relatively speaking - ~40 bytes/corresponding TxOut, compared to ~500 bytes for the average tx having 2-3 TxIns.  With just these TxOuts, he can verify that his txs are self-consistent, but cannot know whether any given TxOut has already been spent.

This is where the coordinator comes into play.  He cycles through the N branches, and for each branch, nominates one of the curator mirrors that wishes to submit a subblock.  This branch curator then gathers a bunch of self-consistent txs, and compresses the few byte ids of their TxIns into a prefix tree.  He sends his respective counterparts - or rather, one of their mirrors who are up to date with the previous subblock - the appropriate branches, and they send back subbranches of those that are invalid with respect to the previous subblock.  Note that this communication is cheap - a few bytes per tx.  He then removes the invalid txs from his bunch, informs his counterparts of the TxIns that remain so they can delete the corresponding utxos from their respective utxo tree branches, deletes those relevant to him, inserts all of his newly created TxOuts into his utxo tree branch, and builds his tx tree.  He submits his tx and utxo tree root hashes to the coordinator, who also gathers the other branch curators' updated utxo tree root hashes.  This data is used to compute the full tx and utxo tree root hashes, which is then finally submitted to miners.

When the coordinator has cycled through all N branches, he goes back to the first who we note can perform very efficient updates to his existing tx tree.

Some notes:

  • Mutual trust between all parties was assumed in lots of ways, but this could be weakened using a fraud detection and punishments scheme - ingredients being e.g. authenticated sum trees, fidelity bonds, and lots of eyes to audit each step.  Trusted hardware or SCIP proofs at each step would be the ideal future tech for trust-free cooperation.
  • The job of the coordinator is cheap and easy.  The branch curators could all simultaneously replicate all of its functions, except nominating subblock submissions.  For that they'd need a consensus forming scheme.  Perhaps miners including into their coinbase a digest of their preferred next several subblock nominees, and broadcasting sub-difficulty PoW would be a good alternative.
  • Subblock nominees could be selected by largest estimated total fee, or estimated total fee / total size of txs, or some more complicated metric that takes into account changes to the utxo set size.
  • Revision requests for a chain of subblocks could be managed such that that the whole chain will be valid when each of the subblocks come back revised, thus speeding up the rate at which new blocks can be added to the chain.
  • Nearby branch curators will have no overlap in txs submitted, and very little overlap in utxos spent by them (only happens for double spends).

Distributed block verification

To catch up with other miners' blocks, branch curators would download the first few identifying bytes of the txids in their respective branches, to find which txs need to be included in the update.  The ones they don't have are downloaded.  Then in rounds, they would perform collective updates to the tx and utxo trees, so that txs that depend on previous txs will all eventually be covered.  If by the end the tx and utxo tree root hashes match those in the block header, the block is valid.

Future tech: branch curators would instead simply verify a small chain of SCIP proofs Smiley

Additional note: branch curators can additionally maintain an index of (script: txid:n) for their branch, in order to aid lightweight clients doing lookups by script.
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1011


View Profile
June 17, 2013, 05:33:14 AM
 #371

An index keyed by (txid:n) will have to be maintained for block validation anyway. My current plan is to have one index (hash(script), txid:n) -> balance for wallet operations, and another (txid:n) -> CCoins for validation.

Transactions within blocks are processed in-order, and so cannot depend on later transactions.

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
June 17, 2013, 06:08:07 AM
 #372

Completely off topic.  I was thinking that " Reiner-Friedenbach tree" is way too long and way too German.  What about the "Freiner tree"? Too cheesy?   It does seem to be an elegant mixture of Reiner, Friedenbach, and Freicoin.

I really wanted to rename this thread to something more appropriate, but I'm not sure what, yet. 

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
d'aniel
Sr. Member
****
Offline Offline

Activity: 461
Merit: 251


View Profile
June 17, 2013, 08:32:20 AM
 #373

An index keyed by (txid:n) will have to be maintained for block validation anyway.
Right, I guess that's why it's the natural keying for distributed nodes.

Quote
My current plan is to have one index (hash(script), txid:n) -> balance for wallet operations, and another (txid:n) -> CCoins for validation.
The question then is which one's digest gets included in the block header?  Having both would be nice, but maintaining the (hash(script): txid:n) one seems to make distributing a node a lot more complex and expensive.  The downside to only having the (txid:n, script) one's digest committed in the block header is that you can't concisely prove a utxo with a given script doesn't exist.  But you can still concisely prove when one does, and that seems to be what's really important.  Also, if only one of the trees needs the authenticating structure, then this would be less overhead.

Quote
Transactions within blocks are processed in-order, and so cannot depend on later transactions.
Okay, but I don't think an individual block really needs to encode an explicit ordering of transactions, as the tx DAG is the same for any valid ordering.  As long as a node can find some ordering that's consistent, then that's good enough.
d'aniel
Sr. Member
****
Offline Offline

Activity: 461
Merit: 251


View Profile
June 17, 2013, 08:53:03 AM
 #374

Completely off topic.  I was thinking that " Reiner-Friedenbach tree" is way too long and way too German.  What about the "Freiner tree"? Too cheesy?   It does seem to be an elegant mixture of Reiner, Friedenbach, and Freicoin.

I really wanted to rename this thread to something more appropriate, but I'm not sure what, yet. 
How about "Authenticated dictionary of unspent coins"?

I like it cause it's self-explanatory Smiley
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1011


View Profile
June 17, 2013, 08:54:15 AM
 #375

The downside to only having the (txid:n, script) one's digest committed in the block header is that you can't concisely prove a utxo with a given script doesn't exist.  But you can still concisely prove when one does, and that seems to be what's really important.

No, you need to be able to prove both, otherwise we're right back where we started from in terms of scalability and lightweight clients.

One data structure is needed for creating transactions, the other is required for validating transactions. It's rather silly and myopic to optimize one without the other, and I will accept no compromise - both will be authenticated and committed so long as I have any say in it.

Quote
Transactions within blocks are processed in-order, and so cannot depend on later transactions.
Okay, but I don't think an individual block really needs to encode an explicit ordering of transactions, as the tx DAG is the same for any valid ordering.  As long as a node can find some ordering that's consistent, then that's good enough.

I'm simply reporting how bitcoin works: if you include transactions out of order, your block will not validate.

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
ThomasV
Legendary
*
Offline Offline

Activity: 1896
Merit: 1343



View Profile WWW
June 17, 2013, 08:56:07 AM
 #376

Also, was there a verdict on the 2-way (bitwise) trie vs. 256-way + Merkle trees in each node?  I've been thinking lately about sharding block creation/verification, and am noticing the advantages of the bitwise trie since its updates require a much more localized/smaller set of data.

I guess what really matters if the amount of database operations.
if the hash of a node is stored at its parent, (and each node stores the hashes for all its children)
then updating the hash of a node requires only one database read and one write, instead of reading all children (see my previous post).

Electrum: the convenience of a web wallet, without the risks
d'aniel
Sr. Member
****
Offline Offline

Activity: 461
Merit: 251


View Profile
June 17, 2013, 09:20:30 AM
Last edit: June 17, 2013, 07:24:50 PM by d'aniel
 #377

No, you need to be able to prove both, otherwise we're right back where we started from in terms of scalability and lightweight clients.

One data structure is needed for creating transactions, the other is required for validating transactions. It's rather silly and myopic to optimize one without the other, and I will accept no compromise - both will be authenticated and committed so long as I have any say in it.
I'm not sure I follow.  I understand that one of the main benefits of the idea is to be able to prove to a lightweight client that a coin is currently valid; whereas now, when given a Merkle path to a tx in some block, a lightweight client doesn't necessarily know if a txout in this tx was spent in another tx after that block.  That seems like a big improvement.  But the problem isn't that the malicious peer isn't serving data at all, it's that he's serving it in a misleading way.  Isn't it true that either keying ensures peers can't mislead lightweight clients in this way?

Quote
I'm simply reporting how bitcoin works: if you include transactions out of order, your block will not validate.
I appreciate that.  The distributed node idea assumed a couple significant protocol changes, and I just wanted to be sure they wouldn't break anything.
d'aniel
Sr. Member
****
Offline Offline

Activity: 461
Merit: 251


View Profile
June 17, 2013, 09:43:26 AM
Last edit: June 17, 2013, 10:04:00 AM by d'aniel
 #378

Also, was there a verdict on the 2-way (bitwise) trie vs. 256-way + Merkle trees in each node?  I've been thinking lately about sharding block creation/verification, and am noticing the advantages of the bitwise trie since its updates require a much more localized/smaller set of data.

I guess what really matters if the amount of database operations.
if the hash of a node is stored at its parent, (and each node stores the hashes for all its children)
then updating the hash of a node requires only one database read and one write, instead of reading all children (see my previous post).

I noted somewhere a while back in this thread that a group of 2-way nodes can be 'pruned' to produce a (e.g.) 256-way one for storage, and a 256-way one can be 'filled in' after retrieving it from storage to produce a group of 2-way ones.  Not sure what the optimal radix size is for storage, but database operations are definitely the primary concern.

If a write-back cache policy is used so that the 'trunk' of the tree isn't being constantly rewritten on the disk, then using a bitwise trie would mean not having to rebuild full Merkle trees from scratch for each of the upper nodes during every update.  Not sure if this is a big deal though.

Edit: Never mind about that last point.  Merkle tree leaves are rarely ever added or removed in the upper nodes, so updating the Merkle tree would usually only require changing a single path.
jtimon
Legendary
*
Offline Offline

Activity: 1372
Merit: 1002


View Profile WWW
June 18, 2013, 09:26:00 AM
 #379

About SCIP, if the UTXO tree is on each block, it can be done non-recursively.
Clients would donwload the whole header chain, not only with the proof of work but also with the "signature" that proves the transition from the previous UTXO to the current one is correct.

But I don't know SCIP in detail neither or their schedule. So, yes, it would be interesting to have someone that explains this stuff applied ot this first hand.
Can anyone bring Ben to this conversation? I don't even know his nick on the forums...

2 different forms of free-money: Freicoin (free of basic interest because it's perishable), Mutual credit (no interest because it's abundant)
flipperfish
Sr. Member
****
Offline Offline

Activity: 350
Merit: 251


Dolphie Selfie


View Profile
June 18, 2013, 01:39:44 PM
 #380

Possible? Yes. Desirable? No. It's important that miners verify that they haven't been duped onto a side chain. It is, however, okay for them to throw away those historical transactions once they have been verified and just keep the UTXO set.

Yeah, I did not mention the UTXO set because I thought it's obivous.

The reason I brought up this is, I believe a lot of us are willing to run a USB miner to secure the network, without generating any noticeable revenue, now that it's out and very power-efficient, the power cost of keeping one running is somehow negligible, but if we have to download and store the rapidly growing full chain, the cost may grow significantly.

The miner could validate the entire history or synchronize with constant storage requirements, throwing away data as it is no longer required.
[/quote]

Why can the miner not use the block-headers only to verify, that he's on the right chain? Is there a reason all the already spent transactions of the past have to be stored somewhere?
jtimon
Legendary
*
Offline Offline

Activity: 1372
Merit: 1002


View Profile WWW
June 18, 2013, 01:57:04 PM
 #381

Why can the miner not use the block-headers only to verify, that he's on the right chain? Is there a reason all the already spent transactions of the past have to be stored somewhere?

Only reading the headers you can only tell, which one is the longest chain if you receive several of them. But you want the longest VALID chain.
Here longest is the chain with more proof of work.
You have to validate all transactions to be able to know that the chain is following the protocol rules.

2 different forms of free-money: Freicoin (free of basic interest because it's perishable), Mutual credit (no interest because it's abundant)
etotheipi (OP)
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
July 09, 2013, 07:00:18 PM
 #382

So this was "merged" with another topic... I'm not really sure what that means or why it happened.  But I'm not finding the most recent posts.   The last one I see is June 18.  Any idea what happened?  Any way to get all the posts since then?

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
erk
Hero Member
*****
Offline Offline

Activity: 826
Merit: 500



View Profile
October 10, 2013, 08:40:04 PM
 #383

I just installed a fresh copy of Bitcoin-Qt v0.85-beta, and it's taken 2 full days so far to download the block chain with 10-20 peers most of the time, and it's still only up to August. This strikes me as something wrong, if I were running a pool or exchange it would be a disaster and I would have lots of unhappy users. Not only does the block chain size need to be looked into, but the replication method. If I were trying to download the same size data using .torrent it would have finished in hours, not days.
altsay
Sr. Member
****
Offline Offline

Activity: 359
Merit: 250


View Profile
December 06, 2013, 08:47:36 AM
 #384

I wonder isnt it possible to copy the up-to-date blockchain in the configuration folder and paste it to another computer.
ShadowOfHarbringer
Legendary
*
Offline Offline

Activity: 1470
Merit: 1005


Bringing Legendary Har® to you since 1952


View Profile
December 06, 2013, 10:52:55 AM
 #385

I wonder isnt it possible to copy the up-to-date blockchain in the configuration folder and paste it to another computer.
Yep, already done. Several times.

But (obviously) stop Bitcoin client first.

maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1011


View Profile
December 20, 2013, 02:07:17 AM
 #386

I have posted to the development mailing list the first draft of a BIP for a binary authenticated prefex tree derivative of the one described in this thread. This starts the process of getting it turned into an official BIP standard. The draft is available for viewing here:

https://github.com/maaku/bips/blob/master/drafts/auth-trie.mediawiki

All comments and criticisms are welcome.

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
sugarpuff
Newbie
*
Offline Offline

Activity: 58
Merit: 0


View Profile WWW
March 05, 2014, 06:58:15 PM
Last edit: March 05, 2014, 11:16:27 PM by sugarpuff
 #387

Could etotheipi negativeone (I *just* realized what that says!) or someone knowledgeable comment on how the "rolling-root"/"ledger-solution" impacts this proposal (whether it enhances it, makes it unnecessary, or is wrong for reasons XYZ)?

Summary: keep the blockchain down to some finite size by moving unspent transactions out of old blocks and into new ones at the head via a "rolling-root" mechanism.

Relevant link: https://bitcointalk.org/index.php?topic=501039
Raize
Donator
Legendary
*
Offline Offline

Activity: 1419
Merit: 1015


View Profile
March 05, 2014, 08:14:06 PM
 #388

I have posted to the development mailing list the first draft of a BIP for a binary authenticated prefex tree derivative of the one described in this thread. This starts the process of getting it turned into an official BIP standard. The draft is available for viewing here:

https://github.com/maaku/bips/blob/master/drafts/auth-trie.mediawiki

All comments and criticisms are welcome.

I guess my only comment right now is why doesn't this have a BIP number assigned to it? Due to the hard fork requirements?
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1011


View Profile
March 05, 2014, 08:55:57 PM
 #389

Because I don't assign BIP numbers. There's a process to getting a BIP number assigned, which I am still working towards.

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
jtimon
Legendary
*
Offline Offline

Activity: 1372
Merit: 1002


View Profile WWW
March 05, 2014, 08:58:28 PM
 #390

Due to the hard fork requirements?

This would be a softfork, not a hardfork. Meaning that a majority of miners must update, but not all users.

2 different forms of free-money: Freicoin (free of basic interest because it's perishable), Mutual credit (no interest because it's abundant)
sugarpuff
Newbie
*
Offline Offline

Activity: 58
Merit: 0


View Profile WWW
June 18, 2014, 10:34:49 PM
 #391

Could somebody here knowledgeable about the details of UTXO please confirm whether my understanding of how it would work in practice is correct?
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1011


View Profile
June 18, 2014, 10:55:25 PM
 #392

You never ever under any circumstances want to be mining on top of a chain you have not validated the entire history of.

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
doldgigger
Full Member
***
Offline Offline

Activity: 170
Merit: 100


View Profile
June 18, 2014, 11:38:10 PM
 #393

Before I read this I just want to quickly post that I personally, no matter whether justifiably or unjustifiably, I personally feel like this is the most pressing issue when it comes to Bitcoin's successful future and I really hope the core team has planed an order of priorities accordingly.

Why pressing? The blockchain is easy to understand and verify. And the practically required size might be reduced by imposing a fee on old inputs, which would lead to implementors of wallet software implementing per-wallet compression strategies in order to avoid the fee. Having some archival nodes with a full history still wouldn't hurt.

19orEcoqXQ5bzKbzbAnbQrCkQC5ahSh4P9
Feel free to PM me for consulting and development services.
sugarpuff
Newbie
*
Offline Offline

Activity: 58
Merit: 0


View Profile WWW
June 19, 2014, 12:51:02 AM
 #394

You never ever under any circumstances want to be mining on top of a chain you have not validated the entire history of.

Interesting point. Hope you don't mind if I mention your reply in that other thread as well.

So, what is the takeaway from that then? That new lite-nodes can use UTXO to validate arbitrary queries, but they cannot participate in securing the network until they have all the transactions for the leaf nodes of the entire UTXO tree?
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1011


View Profile
June 19, 2014, 03:09:13 AM
 #395

no it's worse than that -- you have to download and process every single block since Genesis. otherwise you may be on an invalid chain and not even know it.

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
sugarpuff
Newbie
*
Offline Offline

Activity: 58
Merit: 0


View Profile WWW
June 19, 2014, 03:19:10 AM
 #396

no it's worse than that -- you have to download and process every single block since Genesis. otherwise you may be on an invalid chain and not even know it.

Can you explain why that is?

If what you're saying is true, then the rolling root is still needed for Bitcoin's future feasibility and safety: http://hackingdistributed.com/2014/06/18/how-to-disincentivize-large-bitcoin-mining-pools/#comment-1442133143
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1011


View Profile
June 19, 2014, 04:27:18 AM
 #397

Sure I can, now that I'm at my computer instead of my phone Wink

The miner needs to verify the entire block chain history because otherwise he has no way of knowing if he is actually on a valid chain or not. This has nothing to do with UTXO commitments, rolling root, or any other proposal. It's a basic, fundamental requirement of any consensus system: if the miners themselves operate in SPV mode (which you advocate), then anyone -- no matter their hashrate! -- can trick the network into mining an invalid chain. The attacker does so by mining a fork with invalid history and temporarily (by luck or 51%) overcoming the honest network. New miners coming online, or miners tricked into reseting their state then switch to the invalid chain This completely invalidates the SPV assumption and makes it unsafe for anybody to use the network.

"Rolling root" doesn't even make sense in the context of bitcoin, as has been explained multiple times by multiple people in your own thread. Let's not bring that discussion here. If your goal is to minimize the amount of data required to bring new non-mining nodes online, then that is what UTXO commitments does.

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
sugarpuff
Newbie
*
Offline Offline

Activity: 58
Merit: 0


View Profile WWW
June 19, 2014, 04:48:56 AM
Last edit: June 19, 2014, 05:14:51 AM by sugarpuff
 #398

The miner needs to verify the entire block chain history because otherwise he has no way of knowing if he is actually on a valid chain or not. This has nothing to do with UTXO commitments, rolling root, or any other proposal. It's a basic, fundamental requirement of any consensus system: if the miners themselves operate in SPV mode (which you advocate)

Full stop. I have never advocated that. If you believe that, then you never understood the rolling root proposal.

I agree it's best to keep the threads separate. If you have questions about how rolling root works feel free to ask in the other thread.

Sticking to UTXO though, I don't believe you answered my question. UTXO is not SPV either, so it is still not clear to me why you say they don't know whether they're on a valid chain or not. They downloaded the headers from genesis (SPV), but in addition to that they downloaded the entire UTXO meta chain which they can then use to verify any txn and build the merkle/btree or w/e the latest data structure is.
doldgigger
Full Member
***
Offline Offline

Activity: 170
Merit: 100


View Profile
June 19, 2014, 06:36:26 PM
 #399

no it's worse than that -- you have to download and process every single block since Genesis. otherwise you may be on an invalid chain and not even know it.

People might speed up that process by relying on checkpoint hashes built into the client. Since most cryptocurrency client software is developed using git (which also comes with a cryptographically secured history), detecting manipulations there is also practical in most cases.

19orEcoqXQ5bzKbzbAnbQrCkQC5ahSh4P9
Feel free to PM me for consulting and development services.
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1011


View Profile
June 19, 2014, 08:02:18 PM
 #400

Checkpoints are a temporary hack that will go away soon, we hope.

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
sugarpuff
Newbie
*
Offline Offline

Activity: 58
Merit: 0


View Profile WWW
June 20, 2014, 01:26:59 AM
 #401

Checkpoints are a temporary hack that will go away soon, we hope.

Maaku, I take your silence to indicate either that you believe I am wrong but do not (for whatever reason) want to explain why you think so.

Or... well, actually that's my only hypothesis.

For my part, I do not see how you can be right, and I provided my reasons for why I think that. As far as I can tell, once the leaves have all been fetched, miners can safely mine on the blockchain without having to download the histories of all those coins.
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1011


View Profile
June 20, 2014, 06:59:36 AM
 #402

I explained myself, twice:

The miner needs to verify the entire block chain history because otherwise he has no way of knowing if he is actually on a valid chain or not. This has nothing to do with UTXO commitments, rolling root, or any other proposal. It's a basic, fundamental requirement of any consensus system: if the miners themselves operate in SPV mode (which you advocate), then anyone -- no matter their hashrate! -- can trick the network into mining an invalid chain. The attacker does so by mining a fork with invalid history and temporarily (by luck or 51%) overcoming the honest network. New miners coming online, or miners tricked into reseting their state then switch to the invalid chain This completely invalidates the SPV assumption and makes it unsafe for anybody to use the network.

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
doldgigger
Full Member
***
Offline Offline

Activity: 170
Merit: 100


View Profile
June 20, 2014, 01:21:19 PM
 #403

Checkpoints are a temporary hack that will go away soon, we hope.

How is that? Performance suddenly considered a bad thing?

19orEcoqXQ5bzKbzbAnbQrCkQC5ahSh4P9
Feel free to PM me for consulting and development services.
sugarpuff
Newbie
*
Offline Offline

Activity: 58
Merit: 0


View Profile WWW
June 20, 2014, 01:32:19 PM
Last edit: June 20, 2014, 02:44:54 PM by sugarpuff
 #404

I explained myself, twice:

The miner needs to verify the entire block chain history because otherwise he has no way of knowing if he is actually on a valid chain or not. This has nothing to do with UTXO commitments, rolling root, or any other proposal. It's a basic, fundamental requirement of any consensus system: if the miners themselves operate in SPV mode (which you advocate), then anyone -- no matter their hashrate! -- can trick the network into mining an invalid chain. The attacker does so by mining a fork with invalid history and temporarily (by luck or 51%) overcoming the honest network. New miners coming online, or miners tricked into reseting their state then switch to the invalid chain This completely invalidates the SPV assumption and makes it unsafe for anybody to use the network.

In what universe is that an answer to the facts that I pointed out?

You're continuing to ignore that:

1. I did not advocate SPV.
2. UTXO is not SPV.
3. Rolling root is not SPV.

The 51% attack you describe would have *equal* impact on nodes with the full history (so far as I understood what you're describing).

You never explained—anywhere—how UTXO "full-leaf nodes" are different in any meaningful manner from nodes with complete transaction histories.

Still waiting for that reply.
instagibbs
Member
**
Offline Offline

Activity: 114
Merit: 12


View Profile
June 20, 2014, 02:44:55 PM
 #405



Still waiting for that reply.

Let's say we only have one ledger block, or whatever, or UTXO commitment thing. If we don't know the blocks before that, other than just headers, we can't verify that people are building on valid blocks. Someone could have made a fraudulent ledger block, and built on top of it. If people don't catch this, and never check contents before that, they have successfully attacked the network.

I think of it as SPV++ security: We are saying the ledger blocks/UTXO commitments are "secure" based on how deep in the chain they are, rather than looking at height. The only way to know the chain is valid is to start 100% from the beginning, and work your way through until you reach the current height.
sugarpuff
Newbie
*
Offline Offline

Activity: 58
Merit: 0


View Profile WWW
June 20, 2014, 02:59:12 PM
Last edit: June 20, 2014, 03:13:53 PM by sugarpuff
 #406

Let's say we only have one ledger block, or whatever, or UTXO commitment thing. If we don't know the blocks before that, other than just headers, we can't verify that people are building on valid blocks. Someone could have made a fraudulent ledger block, and built on top of it. If people don't catch this, and never check contents before that, they have successfully attacked the network.

(For any audience, "ledger block" is same thing as "rolling root".)

Thank you for the reply (and trying to address my questions)! I'm not convinced that is correct, however. How would this attacker create a convincing fraudulent ledger block that beats the ledger block of the honest nodes?

1. The ledger block of the honest nodes will contain transactions that can be verified to have thousands of confirmations.
2. The ledger block of the honest nodes will be part of a longer chain than the fraudulent one for the same reasons described in the bitcoin paper.

If I'm wrong, I must be missing some subtle and would appreciate it if you (or anyone else) could explain what that is.
sugarpuff
Newbie
*
Offline Offline

Activity: 58
Merit: 0


View Profile WWW
June 20, 2014, 03:14:18 PM
 #407

Edited point 1 in above post of mine to state:

1. The ledger block of the honest nodes will contain transactions that can be verified to have thousands of confirmations.
instagibbs
Member
**
Offline Offline

Activity: 114
Merit: 12


View Profile
June 20, 2014, 03:22:52 PM
 #408

Edited point 1 in above post of mine to state:

1. The ledger block of the honest nodes will contain transactions that can be verified to have thousands of confirmations.

Ok, well again, this is still SPV-like security(block depth). That's the definition of the security model. Practically you might feel it's ok, but it's certainly a different security model than Bitcoin full node security.
sugarpuff
Newbie
*
Offline Offline

Activity: 58
Merit: 0


View Profile WWW
June 20, 2014, 04:15:36 PM
Last edit: June 20, 2014, 04:29:33 PM by sugarpuff
 #409

(For any audience, see SPV in Thin client security.)

Ok, well again, this is still SPV-like security(block depth). That's the definition of the security model. Practically you might feel it's ok, but it's certainly a different security model than Bitcoin full node security.

In that it is using block-depth it is like SPV (this I already acknowledged in a previous post), but it is not the same because of the extra information in the UTXO, and it is not the same as rolling-root/ledger-block because the current root is guaranteed to not have any relevant transactions prior to it.

Please be specific. What does the attack scenario look like with UTXO and/or ledger-block, and why does it require the full transaction history?

The wiki seems to agree with me actually:

Quote
If such UOT hashes were included in the block chain, a client which shipped with a checkpoint block that had a UOT would only need to download blocks after the checkpoint. Moreover, once the client had downloaded those blocks and confirmed their UOTs, it could discard all but the most recent block containing a UOT.

https://en.bitcoin.it/wiki/Thin_Client_Security#Unused_Output_Tree_in_the_Block_chain_.28UOT.29

Edit: Although it is saying something slightly different (shipping clients with the full UTXO already intact, but I don't think that's necessary with rolling root. Would have to give it more thought, but might not be necessary for UTXO either. If there's a problem for UTXO, it could be fixed with a rolling-root type solution).
instagibbs
Member
**
Offline Offline

Activity: 114
Merit: 12


View Profile
June 20, 2014, 04:27:46 PM
 #410

It's the difference between saying:

1) I believe the UTXO checkpoint is valid because people built on top of it. (SPV-like, depth-based)

and

2) I don't have to trust any checkpoint, I computed the address balances from the beginning of time.

I'm not even trying to argue about a practical attack, I'm simply explaining why they're different. They are.

sugarpuff
Newbie
*
Offline Offline

Activity: 58
Merit: 0


View Profile WWW
June 20, 2014, 04:35:29 PM
 #411

It's the difference between saying:

1) I believe the UTXO checkpoint is valid because people built on top of it. (SPV-like, depth-based)

With a UTXO checkpoint built-in to the client, safety is guaranteed (so far as I can tell) if the checkpoint is valid.

That is the scenario the wiki discusses. The scenario I originally asked about is downloading the entire UTXO chain (not a checkpoint of it).

Quote
I'm not even trying to argue about a practical attack, I'm simply explaining why they're different. They are.

They are different but if the difference isn't meaningful in terms of security for the chain then there's no issue and UTXO and/or ledger-block can be used instead of downloading everything from genesis (this is good news btw! You should want to make it work, otherwise bitcoin won't be sustainable).
warpio
Member
**
Offline Offline

Activity: 110
Merit: 10



View Profile
June 20, 2014, 06:24:27 PM
 #412

If the client has a built-in feature that takes a checkpoint of the UTXO every so often based on the longest valid blockchain, and the code for that feature is well documented and understood, then I think there would be no problem with trusting the built-in UTXO checkpoints.
sugarpuff
Newbie
*
Offline Offline

Activity: 58
Merit: 0


View Profile WWW
June 20, 2014, 07:18:04 PM
Last edit: June 20, 2014, 07:28:27 PM by sugarpuff
 #413

I think it would be helpful (in general) for me to paste here my updated understanding of how these "full-security new lite-nodes" would be booted up:

Quote
For a new node to boot up from scratch and begin securing the network, approximately the following needs to happen:

1. Download entire BTC blockchain headers.
2. Download the entire UTXO meta chain.
3. Begin merged mining on UTXO and BTC blockchains. begin mining *only* when we've built up the complete UTXO tree.
4. For each new transaction the "almost-full node" receives, query other nodes for the block in which the payer's bitcoins were previously located, along with the hashes to verify the merkle root in the BTC blockchain. Verify the root can be found. Edit: This part is just regular SPV!
5. Query other nodes for the transactions and merkle root hashes that result in the merkle root hash of the most recent block in the UTXO blockchain. Verify that previous txn the coins belong to exists in the current data comprising the most recent merkle root of the UTXO blockchain.
6. If the transaction passes the above verifications, store all received data so far in order to build the Merkle hash tree that represents all the UTXOs.

To build the complete UTXO tree, ask for any missing children of our current UTXO merkle tree and verify them according to the above.
sugarpuff
Newbie
*
Offline Offline

Activity: 58
Merit: 0


View Profile WWW
June 20, 2014, 07:27:04 PM
 #414

Erm, just a note, I forgot to remove Step 7 when I copied that post:

Quote
7. Continue mining blocks as per above to secure the network and build new blocks, slowly transforming this light node into a full node.

I deleted that. Mining will commence only *after* our local UTXO merkle tree thing has be completed.
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1011


View Profile
June 21, 2014, 03:54:00 PM
 #415

That appears to be a correct description of SPV+ mode.

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
sugarpuff
Newbie
*
Offline Offline

Activity: 58
Merit: 0


View Profile WWW
June 21, 2014, 07:04:52 PM
 #416

That appears to be a correct description of SPV+ mode.

OK cool, thanks maaku for reviewing it! Smiley

Then it sounds like UTXO / SPV+ is a solution to the long running problem of safely and quickly bringing new bitcoin nodes online. That's really great news if true! Cheesy

So who's working on this? I haven't seen etotheipi post anything recently, is he still working on it? Does he still need donations?
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1011


View Profile
June 22, 2014, 12:25:48 AM
 #417

I don't know of anyone besides me who is still working on UTXO commitments. It is developer time limited right now since for about six months I've been split between multiple projects trying to make ends meet. I posted a summary of the current state earlier in this thread, and any bitcoins or bitcoind developers would be appreciated in speeding the process along.

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
sugarpuff
Newbie
*
Offline Offline

Activity: 58
Merit: 0


View Profile WWW
June 22, 2014, 12:46:25 AM
 #418

I don't know of anyone besides me who is still working on UTXO commitments. It is developer time limited right now since for about six months I've been split between multiple projects trying to make ends meet. I posted a summary of the current state earlier in this thread, and any bitcoins or bitcoind developers would be appreciated in speeding the process along.

Mmmk, sent ya a PM.
ittayEyal
Newbie
*
Offline Offline

Activity: 4
Merit: 0


View Profile
June 23, 2014, 03:31:43 PM
 #419

I guess I'm not sure what's the ultimate goal of this. Do you want to actually prune the Blockchain prefix at some point, or is this just a mechanism to speed up bootstrapping? My feeling is that this mechanism is secure enough for the latter cause, but not for the former.

To verify that a UTXO set i includes all utxo's, the verifier has to go back to the latest UTXO set it trusts j and make sure there are no missing outputs between j and i. There is no way to do that once the Blockchain get pruned at i.

Technically, it's possible to lose utxo's this way, if the network wrongfully accepts a partial UTXO set and prunes the prefix. That being said, it's quite difficult to take advantage of this vulnerability, so I think it is viable for fast bootstrapping.

Or perhaps I'm missing a part of the mechanism? Please correct me if I'm wrong.
sugarpuff
Newbie
*
Offline Offline

Activity: 58
Merit: 0


View Profile WWW
June 24, 2014, 01:47:17 AM
 #420

I guess I'm not sure what's the ultimate goal of this. Do you want to actually prune the Blockchain prefix at some point, or is this just a mechanism to speed up bootstrapping? My feeling is that this mechanism is secure enough for the latter cause, but not for the former.

What do you mean "not for the former"? This scheme would make the prefix irrelevant. If you don't care about the history of those coins there's no issue (that I can see).

Quote
To verify that a UTXO set i includes all utxo's, the verifier has to go back to the latest UTXO set it trusts j and make sure there are no missing outputs between j and i. There is no way to do that once the Blockchain get pruned at i.

Technically, it's possible to lose utxo's this way, if the network wrongfully accepts a partial UTXO set and prunes the prefix. That being said, it's quite difficult to take advantage of this vulnerability, so I think it is viable for fast bootstrapping.

Or perhaps I'm missing a part of the mechanism? Please correct me if I'm wrong.

I think either I've misunderstood what you're saying here, or you've misunderstood the how this would work. Please let me know either way.

Once a node has a complete UTXO tree that corresponds to the most recently accepted block, there should be no problems from that point on. It does not go to the UTXO to verify much of anything, but merely updates it based on newly minted blocks.

The blockchain can be pruned locally according to any lossless scheme the node desires. The only place I see a potential issue is if there's a reorganization (a different fork suddenly exceeds the current in length). If that happens, and the node pruned too closely to the current time, then it might not know how to safely deal with the reorg event. To mitigate against this, nodes can keep a blockchain prefix of sufficient length.
ittayEyal
Newbie
*
Offline Offline

Activity: 4
Merit: 0


View Profile
June 24, 2014, 03:01:28 PM
 #421

I'm not sure who misunderstands. I'll try to rephrase.

Here's the core of my question - does the system (all nodes) forget a prefix of the chain at some point?

If a node reads the entire chain (from genesis), it can prune it locally, sure. But how does a new node bootstrap without the entire chain? It needs to trust a snapshot (rolling root, utxo block, whatever it's called). That's my issue.
sugarpuff
Newbie
*
Offline Offline

Activity: 58
Merit: 0


View Profile WWW
June 24, 2014, 03:13:34 PM
 #422

I'm not sure who misunderstands. I'll try to rephrase.

Here's the core of my question - does the system (all nodes) forget a prefix of the chain at some point?

If a node reads the entire chain (from genesis), it can prune it locally, sure. But how does a new node bootstrap without the entire chain? It needs to trust a snapshot (rolling root, utxo block, whatever it's called). That's my issue.

Rolling root and UTXO are very different architecturally, so I'll just answer for UTXO here.

Nobody, whether it's new nodes or old nodes, needs to know the entire histories of coins. The only thing that matters is whether they have the accurate unspent transactions.

New nodes download the entire UTXO meta chain (step #2 in the summary I posted earlier). This chain is protected by PoW. That's it. By knowing the accurate UTXO tree fingerprint, they can safely build the UTXO tree.
ittayEyal
Newbie
*
Offline Offline

Activity: 4
Merit: 0


View Profile
June 25, 2014, 12:58:07 AM
 #423

New nodes download the entire UTXO meta chain (step #2 in the summary I posted earlier). This chain is protected by PoW. That's it. By knowing the accurate UTXO tree fingerprint, they can safely build the UTXO tree.

Ok, I'm convinced. As long as the UTXO meta chain is verified by the system, that's fine. I thought initially that it's a separately maintained data structure. If it's integrated in the Blockchain, in the sense that an invalid utxo would cause the entire Blockchain block to be rejected, then it's fine. In this case you're not adding any principles to rely on, and security is essentially intact as far as I see.

instagibbs
Member
**
Offline Offline

Activity: 114
Merit: 12


View Profile
June 25, 2014, 01:02:32 AM
 #424

Ok, I'm convinced. As long as the UTXO meta chain is verified by the system, that's fine. I thought initially that it's a separately maintained data structure. If it's integrated in the Blockchain, in the sense that an invalid utxo would cause the entire Blockchain block to be rejected, then it's fine. In this case you're not adding any principles to rely on, and security is essentially intact as far as I see.



I don't think validity of commitment touches validity of Bitcoin block. Bitcoin nodes are allowed to completely ignore this merge-mined chain.
ittayEyal
Newbie
*
Offline Offline

Activity: 4
Merit: 0


View Profile
June 25, 2014, 01:33:40 AM
 #425

I don't think validity of commitment touches validity of Bitcoin block. Bitcoin nodes are allowed to completely ignore this merge-mined chain.

Then these Bitcoin nodes (aka Bitcoin) should not truncate the chain. When you truncate up to some point you trust that you have a valid snapshot up to this time. There is no way to verify the UTXO chain (in the sense that there can be missing transactions) once the prefix is gone. So if someone manages to slip an invalid utxo into the blockchain, and this error is discovered after the prefix is gone, it invalidates every node that forgets prefixes.

So it could work as a fast-bootstrapping probably-reliable mechanism for your home PC. Something between SPV and full wallet. But to be done in full nodes the snapshot mechanism has to be incorporated in the blockchain.
jtimon
Legendary
*
Offline Offline

Activity: 1372
Merit: 1002


View Profile WWW
July 05, 2014, 09:39:25 AM
 #426

I'm not sure who misunderstands. I'll try to rephrase.

Here's the core of my question - does the system (all nodes) forget a prefix of the chain at some point?

If a node reads the entire chain (from genesis), it can prune it locally, sure. But how does a new node bootstrap without the entire chain? It needs to trust a snapshot (rolling root, utxo block, whatever it's called). That's my issue.

Rolling root and UTXO are very different architecturally, so I'll just answer for UTXO here.

Nobody, whether it's new nodes or old nodes, needs to know the entire histories of coins. The only thing that matters is whether they have the accurate unspent transactions.

New nodes download the entire UTXO meta chain (step #2 in the summary I posted earlier). This chain is protected by PoW. That's it. By knowing the accurate UTXO tree fingerprint, they can safely build the UTXO tree.

No offense, but I'm not sure you understand "UTXO" well enough to explain it to others. For starters, UTXO is nothing new, this post is about "committed utxo".
As maaku explained, miners need to know that they're mining on top of the valid chain, that's the only way they can know they have the accurate UTXO.

I don't think validity of commitment touches validity of Bitcoin block. Bitcoin nodes are allowed to completely ignore this merge-mined chain.

There's two ways to implement this: as a merged mined chain or as a softfork. The soft fork would invalidate blocks with an invalid utxo root, with the merged mined chain their validity is independent. I prefer the second option, the mini-chain just seems like a mechanism to deploy it, but until you make the soft fork this is not that useful.

Then these Bitcoin nodes (aka Bitcoin) should not truncate the chain. When you truncate up to some point you trust that you have a valid snapshot up to this time. There is no way to verify the UTXO chain (in the sense that there can be missing transactions) once the prefix is gone. So if someone manages to slip an invalid utxo into the blockchain, and this error is discovered after the prefix is gone, it invalidates every node that forgets prefixes.

So it could work as a fast-bootstrapping probably-reliable mechanism for your home PC. Something between SPV and full wallet. But to be done in full nodes the snapshot mechanism has to be incorporated in the blockchain.

Not sure what you mean by "prefix" here, it seems you mean the past chain.
With committed utxo SPV nodes can be much more secure, but full nodes still need to fully validate the chain (otherwise they would have SPV security, not full node security).

2 different forms of free-money: Freicoin (free of basic interest because it's perishable), Mutual credit (no interest because it's abundant)
sugarpuff
Newbie
*
Offline Offline

Activity: 58
Merit: 0


View Profile WWW
July 05, 2014, 05:31:30 PM
 #427

No offense, but I'm not sure you understand "UTXO" well enough to explain it to others. For starters, UTXO is nothing new, this post is about "committed utxo". As maaku explained, miners need to know that they're mining on top of the valid chain, that's the only way they can know they have the accurate UTXO.

maaku confirmed that I understood UTXO's SPV+ mode.
jtimon
Legendary
*
Offline Offline

Activity: 1372
Merit: 1002


View Profile WWW
July 06, 2014, 11:56:50 AM
 #428

maaku confirmed that I understood UTXO's SPV+ mode.

Maybe you understand it, I don't know, but some of the terms you were using are unusual and I thought some people could get confused, that's all.

2 different forms of free-money: Freicoin (free of basic interest because it's perishable), Mutual credit (no interest because it's abundant)
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1011


View Profile
July 06, 2014, 12:01:17 PM
 #429

No offense, but I'm not sure you understand "UTXO" well enough to explain it to others. For starters, UTXO is nothing new, this post is about "committed utxo". As maaku explained, miners need to know that they're mining on top of the valid chain, that's the only way they can know they have the accurate UTXO.

maaku confirmed that I understood UTXO's SPV+ mode.

Jorge is correct: miners have to know they are building on top of a valid chain, and to do that they need to validate the entire block chain history. The post you linked to endorsed a statement which included this caveat, which your most recent explanation does not.

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
paulkernfeld
Newbie
*
Offline Offline

Activity: 10
Merit: 0


View Profile WWW
January 09, 2016, 02:09:03 PM
 #430

Is there a plan to deal with parties that want to store arbitrary data in the blockchain? Right now, OP_RETURN is used as a way to prevent non-financial data from bloating UTXO trees. However, if someone wants to store arbitrary data in the blockchain, ultimate blockchain c*mpression might encourage them to store this data by using fake addresses, because that way they would get much faster lookups and they could basically use this to build an efficient key-value store. Is this just accepted as something that will inevitably happen?
ripper234
Legendary
*
Offline Offline

Activity: 1358
Merit: 1003


Ron Gross


View Profile WWW
June 12, 2019, 09:32:55 PM
 #431

FYI - Back in 2012 there was a [bounty of 5.725 BTC](https://bitcointalk.org/index.php?topic=93606.0) (& eventually BCH) dedicated to this project, to which I'm the custodian of.
As nobody claimed the bounty, I plan to find a suitable project or projects that deal with Bitcoin Scalability and transfer the coins to them.

Check out the bounty thread for updates, I expect a decision to be made within a couple of weeks.

Please do not pm me, use ron@bitcoin.org.il instead
Mastercoin Executive Director
Co-founder of the Israeli Bitcoin Association
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 [All]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!