Bitcoin Forum
October 24, 2021, 04:12:32 AM *
News: Latest Bitcoin Core release: 22.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 [All]
  Print  
Author Topic: Ultimate blockchain compression w/ trust-free lite nodes  (Read 87767 times)
etotheipi
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1048


Core Armory Developer


View Profile WWW
June 17, 2012, 06:33:40 PM
Last edit: January 20, 2014, 04:54:23 PM by etotheipi
Merited by ETFbitcoin (32), suchmoon (4)
 #1

This idea has been scattered throughout some other threads, but there is no one place that fully explains the idea with pictures.  I believe this relieves two major problems with the network at once -- compression/pruning, and lightweight node security -- and does so in a non-disruptive way.  I am not positive that this is the right way to go, but it definitely warrants discussion.



Summary:  [SEE ILLUSTRATIONS BELOW]

Use a special tree data structure to organize all unspent-TxOuts on the network, and use the root of this tree to communicate its "signature" between nodes.  The leaves of this tree actually correspond to addresses/scripts, and the data at the leaf is actually a root of the unspent-TxOut list for that address/script.  To maintain security of the tree signatures, it will be included in the header of an alternate blockchain, which will be secured by merged mining.  

This provides the same compression as the simpler unspent-TxOut merkle tree, but also gives nodes a way to download just the unspent-TxOut list for each address in their wallet, and verify that list directly against the blockheaders.  Therefore, even lightweight nodes can get full address information, from any untrusted peer, and with only a tiny amount of downloaded data (a few kB).  

(NOTE:  I have illustrated everything as using straight merkle-trees, but as noted in the downsides/uncertainties section: a variant of the merkle-tree will have to be to used that guarantees efficient updating of the tree.)


(1) Major Benefits:
  • (1a) Near-optimal blockchain compression:  theoretically, the size of the pruned blockchain would be proportional to the transaction volume (thus could go up or down), instead of the entire global history which always increases in size.  In practice, it wouldn't be so clean, but you really won't do any better than this.  Whoops! Before this idea was fully developed, I had overlooked the fact that full nodes will still have to maintain the transaction-indexed database.  This address-indexed DB is not a replacement, but would have to be in addition to the that.  Therefore, it necessarily increases the amount of work and data storage of a full node.  But it can simply be an "add-on" to an existing "ultraprune" implementation.  (Either way, this should actually be a downside).
  • (1b) Trustless lightweight-node support:  New nodes entering the network for the first time, will only have to download a tiny amount of data to get full, verifiable knowledge of their balance and how to spend it (much of which can be stored between loads).  A single honest peer out of thousands guarantees you get, and recognize, good data.
  • (1c) Perfectly non-disruptive:  There is no main-network protocol or blockchain changes at all.  All the balance-tree information is maintained and verified in a separate blockchain through merged mining.  In fact, it's so non-disruptive, it could be implemented without any core-dev support at all (though I/we would like their involvement)
  • (1d) Efficient tree querying&updating:  The full-but-pruned nodes of the network will be able to maintain this data structure efficiently.  New blocks simply add or remove unspent coins from the tree, and all operations are "constant time and space" (there is an upper limit on how much time and space is required to prove inclusion of, insert, or delete a piece of data, no matter how big the network is)
  • (1e) No user-setup or options:  Unlike overlay networks, achieving full trust does not require finding a trusted node, or subscribing to a service.  Just like the main blockchain -- you find a bunch of random peers and get the longest chain.  This could be bootstrapped in a similar fashion as the main network.

(2) Downsides and Uncertainties:
  • (1a) See revised (1a) above
  • (2a) Complexity of concept:  This is not simple.  It's a second blockchain, requiring merged mining -- though if it is successful and supported by the community, it could be added to the network by requiring that miners compute and include the root hash of this data structure in the coinbase script (just like with block height).  This is entirely feasible, but it could be a bear to implement it.
  • (2b) Uncertainties about lite-node bootstrap data:  Depending on how the data is structured, there may still be a bit of a data for a lite node to download to get the full security of a full node.  It will, undoubtedly, be much less than downloading the entire chain.  But, there is obviously implications if this security comes at the cost of 1 MB/wallet, or 100 MB/wallet (still better than 4GB, as of this writing).  UPDATE: My initial estimate based on the "Hybrid PATRICIA/Brandais Tree" (aka Reiner-Tree), is that a wallet with 100 addresses could verify its own balance with about 250 kB.
  • (2c) [SEE UPDATE AT BOTTOM] Merkle-tree Alternative Needed: Vanilla merkle-trees will not work, because adding or removing single branches is likely to cause complete recomputation of the tree.  But it should be possible to create an alternative with the following properties:
    • Commutative computation:  a node should be able to get the same answer regardless of whether the tree is computed from scratch, or is based on updating a previous tree.
    • O(log(N)) updating: removing or adding a single leaf node should be able to be done in O(log(N)) time.  With a vanilla merkle tree, this is true only if you remove a node and add a node to the same leaf location.

(3)  Assumptions::
  • (3a) Need verifiable tree roots:  I argue that a regular overlay network won't suffice, solely because it's too easy for malicious nodes to spread incorrect data and muck up the network.  If there's enough malicious nodes in an overlay network, it could make lite nodes that depend on it unusable.  I am assuming it is necessary to have a verifiable source for pruned-headers -- a separate blockchain succeeds because correctness of data is required to be accepted.
  • (3b) Merged mining does what we think it does: It is a secure way to maintain a separate blockchain, leveraging existing mining power.  
  • (3c) Efficient sorting:  Leaf nodes of the main tree will have to be sorted so that all nodes can arrive at the same answer.  However, this can be done using bucket-sort in O(N) time, because the leaf nodes are hashes which should be uniformly distributed.



Alt-Chain Merkle Tree construction:

-- For each address/script, collect all unspent-TxOuts
-- Compute merkle root of each TxOut tree
-- Sort roots, use as leaf nodes for a master-merkle-tree.  
-- Include merkle-root of master tree in alternate chain header.


https://dl.dropboxusercontent.com/u/1139081/BitcoinImg/ReinerAltChain/reinercompression.png



Getting your balance:

-- Download headers of both chains
-- Request unspent-TxOut-hash list.  
-- Compute sub-merkle root for this address
-- Request secondary-branch nodes  (O(log(N))
-- Compute master root; compare to block header
-- Request the full TxOuts for each unspent-TxOut-hash above


https://dl.dropboxusercontent.com/u/1139081/BitcoinImg/ReinerAltChain/reinercompression2.png



Alternate Chain:
All data is included on the alternate blockchain, which is maintained through merged mining on the main chain.  This is only one extra tx per block on the main chain.  That is the full extent of its impact on the main chain, and any nodes that are ignoring/unaware of the alt-chain.


https://dl.dropboxusercontent.com/u/1139081/BitcoinImg/ReinerAltChain/reinerchain.png



Yes, this is a huge undertaking.  Yes, there's a lot of uncertainties. Yes, I need a new merkle tree structure.
But, this idea would kill two massive birds with one stone (kill two albatrosses with one block?)

Alright, tear it apart!




UPDATE:

After lots and lots of discussion and debate, I believe that the address index should be maintained as a trie-like structure.  Other's have expressed interest in a binary-search tree (BST).  Either way, the structure can be adapted to have the same properties we desire of a merkle tree, but with a lot more flexibility, such as very quick insertion, deletion, querying, updating, etc.  My preference is the creme-de-la-creme of tries -- a hybrid PATRICIA tree (level-compressed trie) and de-la-Braindais tree (node-compressed).  It looks something like this:


https://dl.dropboxusercontent.com/u/1139081/BitcoinImg/ReinerAltChain/DataStructures_Values.png

The structure would be indexed by TxOut script ("recipient"), and each node is recursively authenticated by the nodes below it.  The uniqueness of the trie structure guarantees that there is exactly one solution for a given set of TxOuts, which also means that only the existing set of TxOuts need to be obtained in order to create the trie (the BST requires replaying all transactions, in order, to have a well-defined internal structure).  For education on trie structures, see my pretty pictures in this post.

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.
1635048752
Hero Member
*
Offline Offline

Posts: 1635048752

View Profile Personal Message (Offline)

Ignore
1635048752
Reply with quote  #2

1635048752
Report to moderator
1635048752
Hero Member
*
Offline Offline

Posts: 1635048752

View Profile Personal Message (Offline)

Ignore
1635048752
Reply with quote  #2

1635048752
Report to moderator
hazek
Legendary
*
Offline Offline

Activity: 1078
Merit: 1002


View Profile
June 17, 2012, 06:43:53 PM
 #2

Before I read this I just want to quickly post that I personally, no matter whether justifiably or unjustifiably, I personally feel like this is the most pressing issue when it comes to Bitcoin's successful future and I really hope the core team has planed an order of priorities accordingly.

My personality type: INTJ - please forgive my weaknesses (Not naturally in tune with others feelings; may be insensitive at times, tend to respond to conflict with logic and reason, tend to believe I'm always right)

If however you enjoyed my post: 15j781DjuJeVsZgYbDVt2NZsGrWKRWFHpp
etotheipi
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1048


Core Armory Developer


View Profile WWW
June 17, 2012, 06:47:49 PM
 #3

Before I read this I just want to quickly post that I personally, no matter whether justifiably or unjustifiably, I personally feel like this is the most pressing issue when it comes to Bitcoin's successful future and I really hope the core team has an order of priorities planed accordingly.

I too believe this is a critical issue for Bitcoin, as a whole.  I had floated the idea that handling blockchain size was critical, in the past, but other issues seemed more pressing for the devs at the time -- I didn't have a solid idea to promote, and the blockchain size wasn't so out of hand yet.

One nice benefit of this solution is that because it's an alt-chain, technically no core devs have to be on-board.  It can be done completely indepedently and operate completely non-disruptively, even with only the support of other devs who believe in it.  I'd certainly like to get core devs interested in it, as they are very smart people who probably have a lot of good ideas to add.  But one of the biggest upsides here is that it can be done completely indepdently.


Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
socrates1024
Full Member
***
Offline Offline

Activity: 125
Merit: 104


Andrew Miller


View Profile
June 17, 2012, 07:22:01 PM
Last edit: June 17, 2012, 07:46:30 PM by socrates1024
Merited by ETFbitcoin (2)
 #4

Let me try to explain a solution to the 'alternate Merkle tree' you require.

The basic idea is to use a balanced binary search tree, such as a red-black tree or a 2-3 tree. Updating such a data structure, including rebalancing, only requires accessing O(log N) tree nodes. A lite client would be able to verify such an update by receiving just the relevant nodes. There is never a need to recompute the entire tree from scratch. Balancing is strict, in that the worst-case length from the root to a leaf never exceeds O(log N).

There's been a bunch of academic research on this topic, where it's known as "Authenticated Data Structures" [1,2,3]. Although the idea has been around for almost 15 years, I don't know of a single implementation. So I made a toy implementation available at https://github.com/amiller/redblackmerkle

I'm hoping to spread awareness of this technique, since it's pretty clear that some clever use of Merkle trees is going to be important in Bitcoin's future. Let's discuss this!


P.S. Most of these structures only require collision resistant hash functions. However, if you want to use a fancy hash functions with special (homomorphic) properties, you can make even more efficient structures, e.g. a Merkle 'bloom filter' [4].
 

[1] Certificate Revocation and Certificate Update
     Noar and Nissim, 1998. USENIX
     https://www.usenix.net/publications/library/proceedings/sec98/full_papers/nissim/nissim.pdf

[2] Authenticated Data Structures
     Roberto Tamassia, 2003.
     http://cs.brown.edu/research/pubs/pdfs/2003/Tamassia-2003-ADS.pdf

[3] Persistent Authenticated Data Structures and their applications
     Anagnostopoulos, Goodrich, and Tamassia
     http://cs.brown.edu/people/aris/pubs/pad.pdf

[4] Cryptography for efficiency: Authenticated Data Structures Based on Lattices and Parallel Online Memory Checking
     Papamanthao and Tamassia, 2011
     http://www.cse.msstate.edu/~ramkumar/gw-102.pdf

amiller on freenode / 19G6VFcV1qZJxe3Swn28xz3F8gDKTznwEM
[my twitter] [research@umd]
I study Merkle trees, credit networks, and Byzantine Consensus algorithms.
etotheipi
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1048


Core Armory Developer


View Profile WWW
June 17, 2012, 07:33:39 PM
 #5

Let me try to explain a solution to the 'alternate Merkle tree' you require.

The basic idea is to use a balanced binary search tree, such as a red-black tree or a 2-3 tree. Updating such a data structure, including rebalancing, only requires accessing O(log N) tree nodes. A lite client would be able to verify such an update by receiving just the relevant nodes. There is never a need to recompute the entire tree from scratch. Balancing is strict, in that the worst-case length from the root to a leaf never exceeds O(log N).

There's been a bunch of academic research on this topic, where it's known as "Authenticated Data Structures" [1,2,3]. Although the idea has been around for almost 15 years, I don't know of a single implementation. So I made a toy implementation available at https://github.com/amiller/redblackmerkle

I'm hoping spread awareness of this technique, since it's pretty clear that some clever use of Merkle trees is going to be important in Bitcoin's future. Let's discuss this!


P.S. Most of these structures only require collision resistant hash functions. However, if you want to use a fancy hash functions with special (homomorphic) properties, you can make even more efficient structures, e.g. a Merkle 'bloom filter' [4].
 

[1] Certificate Revocation and Certificate Update
     Noar and Nissim, 1998. USENIX
     https://www.usenix.net/publications/library/proceedings/sec98/full_papers/nissim/nissim.pdf

[2] Authenticated Data Structures
     Roberto Tamassia, 2003.
     http://cs.brown.edu/research/pubs/pdfs/2003/Tamassia-2003-ADS.pdf

[3] Persistent Authenticated Data Structures and their applications
     Anagnostopoulos, Goodrich, and Tamassia
     http://cs.brown.edu/people/aris/pubs/pad.pdf

[4] Cryptography for efficiency: Authenticated Data Structures Based on Lattices and Parallel Online Memory Checking
     Papamanthao and Tamassia, 2011
     http://www.cse.msstate.edu/~ramkumar/gw-102.pdf



Wow, thanks socrates!  

That's an excellent place to start.  Of course, I should've thought about that from the start, since I am adept with data structures, and especially trees/tries of sorts.  I have even coded a red-black tree before...  

My brain was stuck on how to modify the base merkle-tree concept into what I wanted, and didn't consider going back to a different (though related) data structure.

There is one problem, though it may be part of the materials you already referenced:  the tree must be constructed identically by all parties, and from any state.  And all binary-tree structures are insert-order dependent, unless you're storing them in some sort of trie.  Specifying an insert order doesn't work, because someone constructing from scratch doesn't know how someone updating from a previous tree will insert them.  But I wouldn't want to go to a basic trie, due to the space inefficiency.  Something like a Patricia tree/trie would probably work, but it's very difficult to implement (correctly) and there's not a lot of existing, trusted implementations out there.

I'll dig into your materials a bit.  Thanks!


Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
2112
Legendary
*
Offline Offline

Activity: 2128
Merit: 1042



View Profile
June 17, 2012, 07:50:25 PM
 #6

(2c) Merkle-tree Alternative Needed
I think this is the crucial observation. Bitcoin doesn't really have a single contiguous bitstream that would need a protection of a hash tree. It has a collection of transactions that are only weakly ordered in a lattice fashion.

The better data structure would be probably some classic database representation like B-tree with cryptographically signed transaction log. To allow integrity verification with a trucated log the log blocks should contain something like a hash of inorder traversal of the database content before the update. This will allow for a quick verification of the new log blocks received from the network.

To allow backward compatibility with forward-delta block-chain the Bitcoin protocol would need additional constraint on the ordering of the transactions in the blocks.

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0
socrates1024
Full Member
***
Offline Offline

Activity: 125
Merit: 104


Andrew Miller


View Profile
June 17, 2012, 08:12:00 PM
Merited by ETFbitcoin (1)
 #7

There are two kinds of orderings here. First is the order in which updates are made to the Merkle trees. Each block is ordered, and within a block transactions are ordered, and within a transaction the txIns and txOuts are ordered. To update your Merkle trees as the result of a bitcoin transaction, you remove all the inputs, and insert all the new outputs. Everyone should be able to do this in the same order, right?


Now, within a Merkle tree, the order is determined by a key. Think of each Merkle tree as a database index. We would like to have at least two kinds of indexes, which means maintaining two instances of the Merkle tree:

1) TxOuts are identified by the transaction hash and an index within that transaction. So we need to search by (txhash,idx) in order to see if an output has been spent. When outputs are inserted into this tree, they're stored in sorted order according to (txhash,idx).

2) It's also desirable to find all the available txouts for a particular address. Let a second Merkle tree contain keys of the form (scriptpubkey). Now, given a root hash, you can ask for a verifiable list of all your spendable coins.


Alternately, instead of thinking of it as a different tree for each index, you can think of it as a composite structure. The general form is a Search DAG (Directed Acyclic Graph), but the idea is exactly the same [5]. (This includes B-Trees).



[5] A General Model for Authenticated Data Structures
     Martel, Nuckolls, Devanbu, Gertz, Kwong, Stubblebine, 2004.
     http://truthsayer.cs.ucdavis.edu/algorithmica.pdf

amiller on freenode / 19G6VFcV1qZJxe3Swn28xz3F8gDKTznwEM
[my twitter] [research@umd]
I study Merkle trees, credit networks, and Byzantine Consensus algorithms.
proudhon
Legendary
*
Offline Offline

Activity: 1820
Merit: 1164



View Profile
June 17, 2012, 08:41:15 PM
 #8

No clue about any of this stuff, but thank you guys for working on this.  I think it's important to have this sorted out before adoption and use gets really heavy.

bitcoin price cannot be sustained above $10k <---- scientific fact
unclescrooge
aka Raphy
Hero Member
*****
Offline Offline

Activity: 868
Merit: 1000


View Profile
June 17, 2012, 09:32:27 PM
 #9

I agree with others, this is a hot issue. And as much as I dislike gambles, I'm thankful Satoshidice put pressure on the blockchain like this.
etotheipi
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1048


Core Armory Developer


View Profile WWW
June 17, 2012, 09:35:40 PM
Last edit: June 17, 2012, 09:57:38 PM by etotheipi
Merited by ETFbitcoin (1)
 #10

There are two kinds of orderings here. First is the order in which updates are made to the Merkle trees. Each block is ordered, and within a block transactions are ordered, and within a transaction the txIns and txOuts are ordered. To update your Merkle trees as the result of a bitcoin transaction, you remove all the inputs, and insert all the new outputs. Everyone should be able to do this in the same order, right?


Now, within a Merkle tree, the order is determined by a key. Think of each Merkle tree as a database index. We would like to have at least two kinds of indexes, which means maintaining two instances of the Merkle tree:

Consider 10 unspent-TxOuts, {0,1,2,3,4,5,6,7,8,9}.

You and I are both constructing the same binary/red-back tree, except that you are starting from scratch and I am starting from a tree where elements {0,1,2,5,6,9,13} are part of the tree.  

I have to add {3,4,7,8} and remove {13} from my tree.   You just add all 10 elements in a specified order.  

We're going to get different roots.  I technically know how your tree would've been constructed, but I might have to start over and reinsert everything from scratch if I wanted to make sure my tree structure matches yours.  

That's what I mean about "commutative computation" -- we need to make sure that regardless of whether you're starting from scratch, or updating an existing tree from any point in the past, that you'll get the same answer.  

As I said before, a trie would work, but is generally very space inefficient, and that matters here.

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1073


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
June 17, 2012, 10:04:31 PM
Merited by ETFbitcoin (1)
 #11

Question: as users download this alt-chain, do they always download the alt-chain from its genesis block (imagine namecoin), or do they always only download a certain number of blocks back, after which point, all earlier blocks are discarded?  (imagine p2pool).

I would have to imagine it was the second.

If it was the first, the value of the alt-chain would tend to zero over time: it would only be useful for pruning spent transactions that existed at the time the alt-chain was created, as all the blocks representing diffs to the alt chain would be the same as (in proportion) to the diffs on the main chain.  That is, at least the way I understood it.

If it was the second, I would imagine that it would be more sensible to create a superblock (say, every 1000 blocks) that publishes a brand new tree, and then all non-superblocks that follow (up to the next superblock) would be updates to that tree.  Anyone downloading the alt-chain would never need more than 1000 blocks: one superblock and all the incremental blocks since the superblock was created.  Anything older than the superblock would be simply unnecessary.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1073


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
June 17, 2012, 10:07:29 PM
 #12

I am assuming it is necessary to have a verifiable source for pruned-headers -- a separate blockchain succeeds because correctness of data is required to be accepted.

I believe you could provide this by publishing both your "genesis block" as well as the source code for the one-off utility you make to produce it.  One holding a complete copy of the Bitcoin block chain should be able to run your program and create a bit-for-bit copy of your genesis block.  Get a few people to do this, and to GPG-sign the hash of the resulting output.  Voila.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
DiThi
Full Member
***
Offline Offline

Activity: 156
Merit: 100

Firstbits: 1dithi


View Profile
June 17, 2012, 10:23:04 PM
 #13

This idea has been scattered throughout some other threads, but there is no one place that fully explains the idea with pictures.

I did. Well, not exactly pictures, but ASCII drawings:

https://en.bitcoin.it/wiki/User:DiThi/MTUT

https://bitcointalk.org/index.php?topic=60911.msg709737#msg709737

I wanted to do a prototype but I have been very busy with other projects since then.

1DiThiTXZpNmmoGF2dTfSku3EWGsWHCjwt
etotheipi
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1048


Core Armory Developer


View Profile WWW
June 17, 2012, 10:35:28 PM
 #14

Question: as users download this alt-chain, do they always download the alt-chain from its genesis block (imagine namecoin), or do they always only download a certain number of blocks back, after which point, all earlier blocks are discarded?  (imagine p2pool).

I would have to imagine it was the second.

If it was the first, the value of the alt-chain would tend to zero over time: it would only be useful for pruning spent transactions that existed at the time the alt-chain was created, as all the blocks representing diffs to the alt chain would be the same as (in proportion) to the diffs on the main chain.  That is, at least the way I understood it.

If it was the second, I would imagine that it would be more sensible to create a superblock (say, every 1000 blocks) that publishes a brand new tree, and then all non-superblocks that follow (up to the next superblock) would be updates to that tree.  Anyone downloading the alt-chain would never need more than 1000 blocks: one superblock and all the incremental blocks since the superblock was created.  Anything older than the superblock would be simply unnecessary.

The second blockchain actually would have no substance to it.  It would solely consist of headers which would contain the master-merkle roots.  The merkle-roots are created from the main chain data, so you go get the data from that network.   There would be no block reward -- your reward is on the main chain which you mining at the same time.

So yes, everyone downloads the entire alt-chain.  But the entirety of this extra chain is the headers themselves:  so it's only about 4-5 MB per year.

I am assuming it is necessary to have a verifiable source for pruned-headers -- a separate blockchain succeeds because correctness of data is required to be accepted.

I believe you could provide this by publishing both your "genesis block" as well as the source code for the one-off utility you make to produce it.  One holding a complete copy of the Bitcoin block chain should be able to run your program and create a bit-for-bit copy of your genesis block.  Get a few people to do this, and to GPG-sign the hash of the resulting output.  Voila.

See (1e) in the original post:  the point of this exercise is to not have to trust specific nodes, GPG/PGP signatures, add centralization, etc.  You trust the proof-of-work.  The merkle-root with the most work behind it on the alt-chain is the merkle-root that you trust to be correct, just like you do with the main-network headers.

We don't want users to have to setup a list of trusted authorities. Then you have revocation lists.  And keep the list updated.  And maintain GPG keys of such authorities.  And politics about who should be trusted authorities.  And of course, centralization.

Or you setup a alternate blockchain, and trust the data that has the most work behind it.  Voila.

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1006


View Profile
June 17, 2012, 11:32:21 PM
Last edit: June 18, 2012, 12:03:45 AM by maaku
Merited by ETFbitcoin (1)
 #15

As others have mentioned, a self-balancing binary search tree would solve the only real technical issue here. Red-black trees would work fine, or their generalized parent structure the 2-3-4 tree (B+tree of order 4), which would provide a conceptually cleaner implementation and serialization format.

Overall, great work. I can assist you in implementing it.

There is one problem, though it may be part of the materials you already referenced:  the tree must be constructed identically by all parties, and from any state.  And all binary-tree structures are insert-order dependent, unless you're storing them in some sort of trie.  Specifying an insert order doesn't work, because someone constructing from scratch doesn't know how someone updating from a previous tree will insert them.  But I wouldn't want to go to a basic trie, due to the space inefficiency.  Something like a Patricia tree/trie would probably work, but it's very difficult to implement (correctly) and there's not a lot of existing, trusted implementations out there.
(EDIT: Sorry, this is basically what socrates above. I should have read the whole thread first:)

This is a non-issue; simply specify the order of insertion/deletion. For example: “Process blocks in order; for each block process transactions in order; and for each transaction first delete all inputs (in order) from, then insert all outputs (in order) into the alt-chain Merkle tree”. You might have to throw a special case in there for transactions that have as input the output of a transaction that occurs later in the same block (is that even allowed?).

Why (and how) would you be creating a tree from scratch without either access to the blockchain or the tree in the last alt-chain checkpoint?

Quote from: etotheipi
Consider 10 unspent-TxOuts, {0,1,2,3,4,5,6,7,8,9}.

You and I are both constructing the same binary/red-back tree, except that you are starting from scratch and I am starting from a tree where elements {0,1,2,5,6,9,13} are part of the tree.  

I have to add {3,4,7,8} and remove {13} from my tree.   You just add all 10 elements in a specified order.  

We're going to get different roots.  I technically know how your tree would've been constructed, but I might have to start over and reinsert everything from scratch if I wanted to make sure my tree structure matches yours.  

That's what I mean about "commutative computation" -- we need to make sure that regardless of whether you're starting from scratch, or updating an existing tree from any point in the past, that you'll get the same answer.

No, that's going in the wrong direction. Updates to the blockchain occur in atomic operations: blocks. Simply mandate that trees are constructed/updated according to the canonical ordering provided by the blockchain. If you insist on creating the search tree from scratch, simply replay the blockchain inserting and removing in the order specified therein. Or you can start from the tree of the last found alt-chain checkpoint, and replay insertions & deletions from that point forward.

Yes, order of operations matters, so standardize an order of operations.

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
etotheipi
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1048


Core Armory Developer


View Profile WWW
June 18, 2012, 12:03:41 AM
 #16

Two big issues brought up so far, in outside discussion:


(1):  I am currently in respectful disagreement with Socrates about the ability to construct stateless unspent-txout-trees.   Recap:  If one were to use a red-black tree to store this info, then the exact structure of that tree will depend on the entire, exact, incremental history of additions and deletions to the tree starting from the genesis block.  To construct the tree from scratch, I must replay the entire history in the same order every time.

I am of the opinion that you should be able to start with the current list of unspent-TxOuts, however it is that you got them, and should be able to construct the correct tree without knowing anything about the history of how elements were added and deleted.  If using the vanilla red-black trees, if I start with just the current unspent TxOuts list, I have every node in the tree -- but I need the entire history of already-spent-TxOuts just to be able to replay that history to construct the tree correctly.  This seems inelegant and dangerous.  But I can't figure out why, other than gut instinct.

The counter argument is that you will never find yourself in this position:  you are either downloading and processing the whole chain, in which case you have no problem replaying the incremental updates.  Or you download from some checkpoint, in which case you can still replay the insertions and deletions from that checkpoint with minimal effort.


(2):  I misunderstood DiThi's original proposal, as a much simpler tree structure that could not provide the trustless lite-node behavior that my trees do.  However, as I re-read it, I'm realizing that I think he's right -- a simpler tree of unspent TxOuts does actually give you that capability.  Is this right?

A couple months ago when I first theorized this idea, I had a very good good reason for believing that you needed to aggregate the unspent-TxOuts (UTXOs) by address/script.  If you didn't do it, bad things would happen.  Unfortunately, I don't remember what those bad things were!  It's entirely reasonable that I mis-applied some logic in my head and decided I needed an unnecesssarily-complex data structure to achieve security.  

So, am I wrong?  I'm not sure.  What are the weaknesses and advantages of DiThi's tree structure vs. mine (his leaf nodes are UTXOs, mine are roots of UTXO sub trees).  One thing I can think of is:  how do you know that a peer gave you the complete list of UTXOs and is not hiding the rest?  Though, I've already made an assumption that you have at least one honest peer, so that probably not an issue... is it?


Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
BrightAnarchist
Donator
Legendary
*
Offline Offline

Activity: 853
Merit: 1000



View Profile
June 18, 2012, 12:09:42 AM
 #17

listening... very good ideas here, similar to my balance chain concept Smiley but much more fleshed out
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1006


View Profile
June 18, 2012, 12:17:43 AM
 #18

Quote
The counter argument is that you will never find yourself in this position:  you are either downloading and processing the whole chain, in which case you have no problem replaying the incremental updates.  Or you download from some checkpoint, in which case you can still replay the insertions and deletions from that checkpoint with minimal effort.
Correct. So why does it matter? I'm not sure why it's “inelegant” either, since as you point out there is no use case for recreating the tree from only an unsorted list of outputs anyway.

Quote
A couple months ago when I first theorized this idea, I had a very good good reason for believing that you needed to aggregate the unspent-TxOuts (UTXOs) by address/script.  If you didn't do it, bad things would happen.  Unfortunately, I don't remember what those bad things were!  It's entirely reasonable that I mis-applied some logic in my head and decided I needed an unnecesssarily-complex data structure to achieve security.
If nothing else, it is certainly convenient to pull in all the unspent outputs for an address without having to go all over the tree, as you can under your approach. That's a very common use case, so it would make sense to optimize for it.

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
etotheipi
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1048


Core Armory Developer


View Profile WWW
June 18, 2012, 12:27:37 AM
 #19

Quote
The counter argument is that you will never find yourself in this position:  you are either downloading and processing the whole chain, in which case you have no problem replaying the incremental updates.  Or you download from some checkpoint, in which case you can still replay the insertions and deletions from that checkpoint with minimal effort.
Correct. So why does it matter? I'm not sure why it's “inelegant” either, since as you point out there is no use case for recreating the tree from only an unsorted list of outputs anyway.

I didn't point that out, because I don't think I agree with it.  It's a counter-argument that, while I can't dispute it directly at the moment, makes me extremely uncomfortable.  What future use case haven't we considered that would be stunted by this decision?  I'm only agreeing that I don't have a direct counter the argument for it (and thus cannot fully defend my position). 

But I'd hardly believe I'm the only one who would be bothered by it:  I think it's extremely inelegant that if I have every node in the tree, that I still have to download GB more data just to know how to organize it (or rather, I might as well just re-download the tree directly, at that point).

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
socrates1024
Full Member
***
Offline Offline

Activity: 125
Merit: 104


Andrew Miller


View Profile
June 18, 2012, 12:51:42 AM
Merited by ETFbitcoin (1)
 #20

I had a very good good reason for believing that you needed to aggregate the unspent-TxOuts (UTXOs) by address/script.  If you didn't do it, bad things would happen.  Unfortunately, I don't remember what those bad things were!

One of the neat things about a Merkle search structure, rather than just an arbitrarily-ordered Merkle tree, is that you can prove that a key is not in the database. Even with a typical Merkle tree, like the current blockchain, it would require a linear effort to prove that a transaction doesn't exist - assuming all you have is an O(1) root hash, and you don't trust anyone!

Even more generally, you can do a verified 'range query' for only O(M log N) effort (where M is the number of results, N is the size of the tree). If you store each unspent-coin in a binary search tree, ordered by the address, then you can ask an untrusted server to give you a snapshot of all the spendable coins associated with that address. There's no way for them to omit any.

Let me try to describe the scenario how I prefer, since it's hard to keep track of the terms otherwise. There are two parties, the Lite-Client (Client) and the Helper (Server). The goal is for the Client not to have to trust the Server, but for the Server to store all the data. The Client only ever needs to store (like, on disk) a constant O(1) amount of state - the root hash. In order to decide what root hash to use, the Client will have to rely on the proof-of-work to recognize the most recent block.

If the Client asks the Server for the list of unspent-coins matching a target address, he receives two or more O(log N) paths through the Merkle tree. The first one is path is to the element with the largest address (lexical ordering) that is smaller than the target address. The last one is a path to the smallest element with a larger address. If there were no transactions matching the target, then these two elements will be adjacent. In any case, the Client iterates through the paths he receives, at each step checking that the paths are adjacent in the tree (and, of course, that the hashes are consistent and lead to the root).

[edit]I hope this turns into a Merkle trees megathread![/edit]

amiller on freenode / 19G6VFcV1qZJxe3Swn28xz3F8gDKTznwEM
[my twitter] [research@umd]
I study Merkle trees, credit networks, and Byzantine Consensus algorithms.
FreeMoney
Legendary
*
Offline Offline

Activity: 1246
Merit: 1014


Strength in numbers


View Profile WWW
June 18, 2012, 12:55:55 AM
 #21

Why would people mine this other chain?

If it has lower hashing power is everyone trusting it more susceptible to double spends somehow?

Play Bitcoin Poker at sealswithclubs.eu. We're active and open to everyone.
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1006


View Profile
June 18, 2012, 01:07:17 AM
 #22

Conceivably you could run an even lighter client that just implicitly trusts the head of the alt-chain. Such a client would rely upon honest miners doing the work of verifying the alt-chain, and not actually perform such checks itself.

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
Maged
Legendary
*
Offline Offline

Activity: 1204
Merit: 1006


View Profile
June 18, 2012, 04:12:10 AM
 #23

Why would people mine this other chain?
Why do miners currently mine transactions that don't have a fee? The answer to both of these questions are that doing such things aren't terribly costly, but more importantly, it encourages further adoption of Bitcoin which will directly result in increased revenue for miners in the future.

Serith
Sr. Member
****
Offline Offline

Activity: 269
Merit: 250


View Profile
June 18, 2012, 06:52:14 AM
Last edit: June 18, 2012, 09:42:38 AM by Serith
 #24

I think that if you want to have the root value to be the same regardless of order in which a tree or any hierarchical structure was created, then you would need a hash function that has commutative and associative properties.
Technomage
Legendary
*
Offline Offline

Activity: 2184
Merit: 1056


Affordable Physical Bitcoins - Denarium.com


View Profile WWW
June 18, 2012, 09:50:08 AM
 #25

Why do miners currently mine transactions that don't have a fee? The answer to both of these questions are that doing such things aren't terribly costly, but more importantly, it encourages further adoption of Bitcoin which will directly result in increased revenue for miners in the future.
This is changing faster than we've anticipated though. Due to the sheer volume of transactions that SatoshiDice produces, higher fees will start to get priority. The bad thing is that current Bitcoin clients don't have a very sophisticated fee system or fee options.

Denarium closing sale discounts now up to 43%! Check out our products from here!
ffe
Sr. Member
****
Offline Offline

Activity: 308
Merit: 250



View Profile
June 18, 2012, 10:35:38 AM
 #26

Tracking
jl2012
Legendary
*
Offline Offline

Activity: 1792
Merit: 1010


View Profile
June 18, 2012, 10:38:18 AM
 #27

I have a proposal as an "add-on" to this

When a miner sees a tx, he could either 1. grab the tx fee, or 2. "donate" the tx fee to the alt-chain.

Grabbing the tx free is essentially the current practice

If it is "donated", it will go to a jackpot pool for the alt-chain. A miner who find a valid block in the alt-chain will claim the jackpot.

By donating the tx fee, miner will get a "discount" in their mining difficulty. The discount will be calculated in a pay-per-share manner. For example, the current difficulty is 1583 177.847444 with block reward of 50BTC. A fair PPS would be 0.00003158. Therefore, a tx fee of 0.0005 is equivalent to 15.831778 shares. By donating the tx fee, the miner will only need to find a block with difficulty of 1583177.847444-15.831778 = 1583162.015666.

Miner will want to donate tx fee since this helps them to find a block easier and reduce  variation. It also provide a feedback mechanism for mining. In a bad luck turn where many tx accumulate, difficulty is reduced and the bad luck turn could be ended earlier.
The effective difficulty is no longer a constant throughout 2016 blocks. Instead, it will be a zig-zag function: highest at the beginning of a round, keep decreasing as unconfirmed tx accumulate, and jump back to the highest value when a block is found. The average block generation rate is still 10-minutes but the variation is reduced too.

Some miners may abuse this system by creating tx with high tx fee (e.g. 25BTC) and donating it to the alt-chain, so their difficulty is reduced by 50%. To prevent this, we may put a cap on the difficulty discount (e.g. 10% of difficulty) and/or calculate the discount with <100% PPS (So the expected return will be decreased).

Donation address: 374iXxS4BuqFHsEwwxUuH3nvJ69Y7Hqur3 (Bitcoin ONLY)
LRDGENPLYrcTRssGoZrsCT1hngaH3BVkM4 (LTC)
PGP: D3CC 1772 8600 5BB8 FF67 3294 C524 2A1A B393 6517
flower1024
Legendary
*
Offline Offline

Activity: 1428
Merit: 1000


View Profile
June 18, 2012, 10:41:10 AM
 #28

isn't it possible to merge mine the balance chain just like many pools do with namecoin?
that way there is not need to give any reward to miners. just include it in bitcoind as a "protocol requirement" for mining.
etotheipi
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1048


Core Armory Developer


View Profile WWW
June 18, 2012, 02:12:07 PM
 #29

isn't it possible to merge mine the balance chain just like many pools do with namecoin?
that way there is not need to give any reward to miners. just include it in bitcoind as a "protocol requirement" for mining.

Yup.  The second chain doesn't need incentive due merged mining, unless it's particularly resource-consuming.  If the software is already written, and integrates into bitcoind or other mining software transparently, and doesn't impact mining performance, then miners would [likely] have no complaints about adopting it given the huge upside it offers.  It's basically free, from their perspective.

Though I don't agree it would be a "protocol requirement."  Miners have various reasons for doing what they do.  And this is being promoted as non-disruptive and optional.  But you don't need more than, probably 20% of mining power to engage this idea for it to be successful.  I'm not even sure what a 51% attack would look like on the alt-chain, but it wouldn't be very exciting -- the worst you could do is prevent some lite-nodes being able to verify their own balance -- but there would be no financial gain for it, and it would cost a fortune.  20% seems like a number high enough that there would always be a "checkpoint" nearby, and high enough that it would be vastly too expensive for a prankster to do anything to that chain.


Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
blueadept
Full Member
***
Offline Offline

Activity: 225
Merit: 100


View Profile
June 18, 2012, 02:29:00 PM
 #30

If the altchain has some place to put a scriptPubKey or an array of them in every block, the chain could be funded with network assurance contracts as proposed by Mike Hearn for the main chain when the subsidy gets lower than fees.

Like my posts?  Connect with me on LinkedIn and endorse my "Bitcoin" skill.
Decentralized, instant off-chain payments.
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1073


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
June 18, 2012, 02:51:55 PM
Merited by ETFbitcoin (1)
 #31

If the idea works and becomes essential to the way most people use Bitcoin, all developers could easily strategize and decide that a future version of clients will only relay/accept blocks when their coinbase contains a valid merged mining record for the other chain's most recent valid block.  It might then be properly called a "meta chain" rather than an "alt chain".

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1006


View Profile
June 18, 2012, 05:19:18 PM
 #32

It might then be properly called a "meta chain" rather than an "alt chain".
That's much better terminology. Thank you.

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
FreeMoney
Legendary
*
Offline Offline

Activity: 1246
Merit: 1014


Strength in numbers


View Profile WWW
June 18, 2012, 05:35:18 PM
 #33


isn't it possible to merge mine the balance chain just like many pools do with namecoin?
that way there is not need to give any reward to miners. just include it in bitcoind as a "protocol requirement" for mining.

Yup.  The second chain doesn't need incentive due merged mining, unless it's particularly resource-consuming.  If the software is already written, and integrates into bitcoind or other mining software transparently, and doesn't impact mining performance, then miners would [likely] have no complaints about adopting it given the huge upside it offers.  It's basically free, from their perspective.

Though I don't agree it would be a "protocol requirement."  Miners have various reasons for doing what they do.  And this is being promoted as non-disruptive and optional.  But you don't need more than, probably 20% of mining power to engage this idea for it to be successful.  I'm not even sure what a 51% attack would look like on the alt-chain, but it wouldn't be very exciting -- the worst you could do is prevent some lite-nodes being able to verify their own balance -- but there would be no financial gain for it, and it would cost a fortune.  20% seems like a number high enough that there would always be a "checkpoint" nearby, and high enough that it would be vastly too expensive for a prankster to do anything to that chain.


Thanks, that makes sense.

Play Bitcoin Poker at sealswithclubs.eu. We're active and open to everyone.
theymos
Administrator
Legendary
*
Offline Offline

Activity: 4270
Merit: 8777


View Profile
June 18, 2012, 06:16:39 PM
 #34

Trustless lightweight-node support:  New nodes entering the network for the first time, will only have to download a tiny amount of data to get full, verifiable knowledge of their balance and how to spend it (much of which can be stored between loads).  A single honest peer out of thousands guarantees you get, and recognize, good data.

It doesn't seem trustless to me. Lightweight nodes (not storing all unspent outputs) can't know whether a block is valid, so they need to trust the majority of the network's mining power. This is no more secure than SPV, though possibly a little easier for lightweight nodes.

There is an advantage to "mostly-full" nodes who store all unspent transactions: they don't have to download all past blocks to start validating. But a regular Merkle tree of unspent outputs works just as well.

Using an alt chain instead of putting the Merkle root in a main-chain transaction seems unnecessarily complex.

1NXYoJ5xU91Jp83XfVMHwwTUyZFK64BoAD
etotheipi
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1048


Core Armory Developer


View Profile WWW
June 18, 2012, 06:43:45 PM
Merited by ETFbitcoin (1)
 #35

Trustless lightweight-node support:  New nodes entering the network for the first time, will only have to download a tiny amount of data to get full, verifiable knowledge of their balance and how to spend it (much of which can be stored between loads).  A single honest peer out of thousands guarantees you get, and recognize, good data.

It doesn't seem trustless to me. Lightweight nodes (not storing all unspent outputs) can't know whether a block is valid, so they need to trust the majority of the network's mining power. This is no more secure than SPV, though possibly a little easier for lightweight nodes.

This doesn't make sense at all.  The entirety of all Bitcoin trust relies on trusting "the majority of the network's mining power."  That is the security model of Bitcoin itself, and has the benefit that you only need one honest node out of one million to be able to distinguish truth amongst a sea of malicious peers.

The issue with SPV is that you have to trust either random peers to give you the correct/complete info, or connect to your own trusted system/subscription to get reliable information.  Without a subscription service, you have to query lots of peers and possibly use filtering/profiling games with peers to identify untrusted nodes, etc.   With this idea, you minimize the amount of information needed to be downloaded, and can compare directly against the headers -- you can get all the information you need from a single peer and know it's right (assuming you've already got the headers).

And that is actually just a side-benefit of it -- the original goal was blockchain compression, which this does near-optimally.

I agree that complexity is high.  But I disagree with the notion that you're not getting something for it.

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
theymos
Administrator
Legendary
*
Offline Offline

Activity: 4270
Merit: 8777


View Profile
June 18, 2012, 07:01:04 PM
 #36

This doesn't make sense at all.  The entirety of all Bitcoin trust relies on trusting "the majority of the network's mining power."  That is the security model of Bitcoin itself, and has the benefit that you only need one honest node out of one million to be able to distinguish truth amongst a sea of malicious peers.

If I have a copy of all unspent outputs, the majority of mining power can only change the ordering of transactions (and this ability might be more limited in the future). But a lightweight node will accept double-spends within the block chain, too-high block subsidies, invalid scripts, etc. if the attacker has enough mining power.

Quote from: etotheipi
The issue with SPV is that you have to trust either random peers to give you the correct/complete info, or connect to your own trusted system/subscription to get reliable information.

You don't have to trust random peers with SPV. SPV clients have the block headers and can use the Merkle roots to accurately get the number of confirmations for any transaction.

1NXYoJ5xU91Jp83XfVMHwwTUyZFK64BoAD
socrates1024
Full Member
***
Offline Offline

Activity: 125
Merit: 104


Andrew Miller


View Profile
June 18, 2012, 07:08:23 PM
 #37

If I have a copy of all unspent outputs, the majority of mining power can only change the ordering of transactions (and this ability might be more limited in the future). But a lightweight node will accept double-spends within the block chain, too-high block subsidies, invalid scripts, etc. if the attacker has enough mining power.

Notice that if the lightweight Client runs the Merkle tree scheme I described, it can rejected double-spends, invalid scripts, etc. Even without needing to store a copy of the unspent outputs.

Instead, the Client receives each block header, as well as O(M log N) of verification data (paths through the Merkle tree). This way, the Client can check each transaction against the database, as indicated by the current root hash. Only a full node acting as an untrusted helper needs to store the entire unspent outputs database, in order to generate the verification data.

amiller on freenode / 19G6VFcV1qZJxe3Swn28xz3F8gDKTznwEM
[my twitter] [research@umd]
I study Merkle trees, credit networks, and Byzantine Consensus algorithms.
etotheipi
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1048


Core Armory Developer


View Profile WWW
June 18, 2012, 07:13:33 PM
 #38

Quote from: etotheipi
The issue with SPV is that you have to trust either random peers to give you the correct/complete info, or connect to your own trusted system/subscription to get reliable information.

You don't have to trust random peers with SPV. SPV clients have the block headers and can use the Merkle roots to accurately get the number of confirmations for any transaction.

Yes, but how do you know a particular TxOut hasn't been spent since it appeared in the blockchain?  I can verify, by downloading full transactions, that a particular TxOut was valid at one point in time, but I have no way to determine if it's been spent since that time.  Whether I'm verifying my own balance, or trying to confirm the validity of the inputs of another transaction, any malicious node can give me seemingly correct information, but still make my node incapable of operating -- perhaps because when I got online the first time an imported my wallet, I was given only old TxOuts that have since been spent, and my node will be stuck creating invalid transactions trying to spend them.

This is the basis of the overlay network that Electrum/Stratum uses.  I don't think there's anything wrong with that overlay network, other than it relies on trusted nodes being setup and available, or subscribing to a service, all of which is a degree of centralization (and extra user effort).  This chain avoids all of it, providing (at the expense of complexity), more reliable information without trusting anyone.  

If that was the only benefit, I'd agree it wouldn't be worth the hassle.  But as I said, that's a secondary benefit of this structure.  The main reason to do it is compression.

Which could be implemented with a simpler tree structure using an alt/meta-chain.  Or this tree-structure using a different distribution technique (protocol change, overlay network, trusting random peers, etc).

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
theymos
Administrator
Legendary
*
Offline Offline

Activity: 4270
Merit: 8777


View Profile
June 18, 2012, 07:32:02 PM
 #39

This is the basis of the overlay network that Electrum/Stratum uses.

Electrum and Stratum don't use SPV. Those clients don't keep block headers (AFAIK). BitcoinJ uses SPV.

Yes, but how do you know a particular TxOut hasn't been spent since it appeared in the blockchain?

I wait until it has 6 confirmations. SPV allows me to determine the number of confirmations accurately (assuming the majority of mining power is honest).

1NXYoJ5xU91Jp83XfVMHwwTUyZFK64BoAD
etotheipi
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1048


Core Armory Developer


View Profile WWW
June 18, 2012, 09:25:36 PM
 #40

This is the basis of the overlay network that Electrum/Stratum uses.

Electrum and Stratum don't use SPV. Those clients don't keep block headers (AFAIK). BitcoinJ uses SPV.

Yes, but how do you know a particular TxOut hasn't been spent since it appeared in the blockchain?

I wait until it has 6 confirmations. SPV allows me to determine the number of confirmations accurately (assuming the majority of mining power is honest).

I'm not concerned about enough confirmations.  I'm actually concerned that it has 10,000 confirmations, and that it was actually spent 3,000 blocks ago, and that I have to search through 10,000 full blocks in order to know whether it's still a valid output.  Or, I can ask other nodes for help, but I how do I know to trust them? 

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
theymos
Administrator
Legendary
*
Offline Offline

Activity: 4270
Merit: 8777


View Profile
June 18, 2012, 10:38:31 PM
 #41

I'm not concerned about enough confirmations.  I'm actually concerned that it has 10,000 confirmations, and that it was actually spent 3,000 blocks ago, and that I have to search through 10,000 full blocks in order to know whether it's still a valid output.  Or, I can ask other nodes for help, but I how do I know to trust them? 

SPV nodes keep copies of all of their own transactions, so they should always know when outputs they own are spent. They don't need to query the network for this information.

1NXYoJ5xU91Jp83XfVMHwwTUyZFK64BoAD
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1006


View Profile
June 18, 2012, 10:45:07 PM
 #42

SPV nodes keep copies of all of their own transactions, so they should always know when outputs they own are spent. They don't need to query the network for this information.
What if you have the same wallet on two systems?

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
galambo
Sr. Member
****
Offline Offline

Activity: 672
Merit: 252


View Profile
June 18, 2012, 11:17:08 PM
Last edit: June 19, 2012, 03:55:44 AM by galambo
 #43

No clue about any of this stuff, but thank you guys for working on this.  I think it's important to have this sorted out before adoption and use gets really heavy.

I think its important that everybody understand this and I didn't see anyone explaining it in general terms.

The Blockchain is a distributed computer file system containing a double entry accounting ledger. Each transaction has two sides which you may be familiar with from accounting: input and output OR debits and credits. However, a major difference is that bitcoin forces a debit(input) to exist for every credit(output).  Storing all of this takes a lot of space. extra explaination

This proposal will continously "balance the books." In accounting, when you close out the books for the quarter all of the debits and credits are summed, and the difference between the two is entered as a "balance" transaction. Because we know that bitcoin forces every credit(output) to have a debit(input), we only have to keep track of all credits(outputs) that are not claimed by a debit(input) to obtain the balance of each address.

The proposal is for a system to store the references to these unspent outputs in a data structure for quick downloading. It doesn't suggest how this tree would be updated efficiently, or how you would quickly grab all of the unspent outputs belonging to one address. This is under discussion.
etotheipi
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1048


Core Armory Developer


View Profile WWW
June 19, 2012, 01:30:45 AM
Merited by ETFbitcoin (1)
 #44

I'm not concerned about enough confirmations.  I'm actually concerned that it has 10,000 confirmations, and that it was actually spent 3,000 blocks ago, and that I have to search through 10,000 full blocks in order to know whether it's still a valid output.  Or, I can ask other nodes for help, but I how do I know to trust them? 

SPV nodes keep copies of all of their own transactions, so they should always know when outputs they own are spent. They don't need to query the network for this information.

Sure they don't.  Until they do.  Maybe they haven't been online in a couple weeks and 2 GB of blockchain has happened since then.  Or maybe they want to check another address that they never owned to begin with.  Or maybe you just imported your wallet to another device (as Maaku said), or you're HDD failed and you are restoring from backup?   Are you now expected to become a nearly-full node to rescan the entire transaction history since your wallet was created 2 years ago? 

You can ask peers for it, and do fancy stuff to convince yourself the history you have been given is real and complete.  Or you avoid all of it and just get it from any peer you see on the network and not have to trust anyone other than majority hashing power.  Download it, verify it and you're done. 




Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1073


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
June 19, 2012, 04:43:37 AM
Last edit: June 19, 2012, 06:22:26 AM by casascius
 #45

OK, so I read that one of the problems making this tree work would be having to deal with recomputing it after each transaction.

What if the meta tree were not kept in a chain?  What if it were just a living thing that could be maintained by peers, and the only thing that would be kept in the chain is the merkle root of the meta tree put into the coinbase transaction?  A peer could request the whole tree from any peer who had it (assuming the peer "offered" the tree).

Further, to lower the overhead associated with updating the tree, let's say that a complete tree rebalance happens every 100 blocks or so.  Until then (i.e. unless blockheight % 100 == 0), leaf nodes would be removed from the tree by replacing them with a placeholder leaf node that would sort into the same place.  This way, one node can prove to another that no real records exist for that address by serving the placeholder node.

For security's sake, each peer offering the meta tree also ought to offer several versions (at least 2) of the meta tree as it looked when it was recreated at a 100-block checkpoint (more on that later).

So to illustrate this with an example...

I am a bitcoin client that comes to the network.  (This hypothetical network has evolved to the point where including the correct merkle root of the current meta tree in the coinbase is mandatory.)

First, I get all of the 80-byte headers going back to the genesis block and persuade myself that I have them all when I do.

Then, if I am interested in my own copy of the whole meta tree (for the good of the network), I ask a peer for it.  The peer sends it to me in its entirety.  I validate it against the merkle root I believe to be valid.

Instead of requesting the up-to-the-minute meta tree, I could request the meta tree 2 checkpoints ago (between 100-200 blocks ago) along with the corresponding full blocks with all the transactions, and play those transactions back.  This will help me be sure that I'm not looking at an orphan fork that completely replaced the meta tree with something else.

If I am NOT interested in my own copy of the whole meta tree, I give a peer a Bitcoin address and ask it to give me the leaf nodes surrounding that address in the sort order, along with the tree lineage to prove it's part of the tree.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
tevirk
Newbie
*
Offline Offline

Activity: 15
Merit: 0



View Profile
June 19, 2012, 06:20:35 AM
 #46

I don't see that even the complete tree of outstanding balances needs to be validated anyway.

To validate a transaction, I need answers to these questions:

  • What block (number and hash) is the latest?
  • In which block (number and hash) ancestral to block aaaaa is transaction xxxxx?
  • Is output n of transaction xxxxx spent in any block betweeen number bbbbb (with hash ppppp) and number ccccc (with hash qqqqq)?

A service can give signed answers to all those questions.  If it ever gives a wrong answer, it can checked and proved to be wrong by anyone with the complete blockchain.   It wouldn't be hard for a client to do random spot-checks on a service at a low rate -- again, if it ever gets a signed wrong answer it has proof for all to see that the service it used was unreliable.

Running a service like that is easy: all you need is enough cpu to keep up with the blockchain, which is trivial, and enough disk to store the blockchain and indexes -- currently about 5GB, say 1TB looking forward a few years.  Again, that is trivial.  The services could be run by ISPs, exchanges, online stores, enthusiasts, and could easily be ad-supported.  It could easily be made into an appliance that would cost only about USD200 at current prices, meaning any retailer using bitcoin regularly could have one of their own.  The client could check with a few different services: again, any wrong answer can be proved to be wrong by anyone with the blockchain.

That surely is the way forward in the long term.
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1073


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
June 19, 2012, 06:26:33 AM
 #47

I don't see that even the complete tree of outstanding balances needs to be validated anyway.
...

If it ever gives a wrong answer, it can checked and proved to be wrong by anyone with the complete blockchain.  

...

That surely is the way forward in the long term.


This thread is about a proposal to one-up that: to create a service whose wrong answers can be proven wrong instantly, automatically, and cryptographically without anything more than a single merkle root you are certain represents the latest state of the chain.  That's better than trusting a service where the only recourse is that you could rat them out if you manage to catch them lying, which you'll never be in a position to do unless you're also in a position to not need their services.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
Realpra
Hero Member
*****
Offline Offline

Activity: 816
Merit: 1000


View Profile
June 19, 2012, 06:34:18 AM
 #48

Once you cut down on the section of the blockchain you need to download while still verifying, you have essentially a swarm node.

I suggested something very similar in my thread about pruning not being enough.

However if you create a tx log instead of a merkle tree then you break compatibility, since the "root hash/log signature" will be different.


My proposal used heavily pruned merkle tree sections to send individual transactions without trust and had no forking.

You would check a new address really had funds by asking the network for the branches concerning just that address.

Even if the majority of nodes withheld information you would only need one honest node to get the complete picture in a short download.

Cheap and sexy Bitcoin card/hardware wallet, buy here:
http://BlochsTech.com
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1073


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
June 19, 2012, 06:50:30 AM
 #49

My proposal used heavily pruned merkle tree sections to send individual transactions without trust and had no forking.

You would check a new address really had funds by asking the network for the branches concerning just that address.

Even if the majority of nodes withheld information you would only need one honest node to get the complete picture in a short download.

How would that compare to maintaining a merkle tree with an enforced sort order: sorted by address?  Then the odds of accepting an answer from a lying node would be zero.  No need to seek a consensus.

If there were an enforced sort order, a response to a query could reliably prove he returned the entire set of records matching the query (or prove none existed), simply by providing a contiguous chunk of leaf nodes that completely contains the query, along with everything up to the merkle root.  Sort of how I could prove "Gordon Dinklebutt" is or isn't in the phone book by providing copies of contiguously numbered pages that include both "Dickson" (which is less than Dinklebutt) and "Dipweed" (which is greater), as well as everything in between.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1073


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
June 19, 2012, 07:04:22 AM
Last edit: June 19, 2012, 07:28:54 AM by casascius
 #50

If the idea of having a meta tree gained a following, then I'd almost like to throw out the following on top of it:  have two meta trees.

A first class meta tree and a second class meta tree.  The purpose of having two meta trees would be so that someone operating a P2P node can choose to offer resources to the network without having to offer unlimited resources to the network, and to have those resources give maximum benefit to the network.  Think: a user who would rather contribute to supporting real transactions instead of spam and bitdust.

The first class meta tree would be constrained in size, and would contain the most important txouts to the network.  "Important" being determined by value as well as age.  The goal would be that everything will end up in the first class tree except for bitdust.

The second class meta tree would contain everything, including the bitdust.

Users would be able to choose the level of resource commitment to the P2P network.  In today's terms, hosting the first tree could be a comfortable 100 megabyte commitment, and you would do it for selfish reasons: fast transaction processing and minimum transaction fees (explained later).  If the first tree had a size cap (perhaps that size is algorithmically pre-determined to follow Moore's law), people could give comfortably to the network.  

The second tree - which would contain everything that didn't fit in the first tree - could help force people putting the biggest burden on the network to pay the most.  Miners would have no choice but to host both trees, but miners are also in a position to force a more generous transaction fee for access to those transactions.

Bitdust would become second-class currency, but easily upgraded to first-class by spending and consolidating it, and waiting a bit.  Owners of bitdust would bear two burdens: a greater transaction fee, and a greater confirmation time when spent to someone who chooses to be agnostic of the second tree.

Someone running a client that only concerned itself with the first-class tree would learn about incoming bitdust transactions not immediately, but only after a miner mined the transaction spending the bitdust into a block, and then that block reached a threshold of anywhere between 6 and 100 confirmations to meet the criteria for it to be integrated into the first class tree.  As long as it was spent in a matter that consolidated it into a bigger transaction instead of leaving it in the form of dust, it would remain first class currency likely permanently.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
Realpra
Hero Member
*****
Offline Offline

Activity: 816
Merit: 1000


View Profile
June 19, 2012, 07:04:29 AM
 #51

My proposal used heavily pruned merkle tree sections to send individual transactions without trust and had no forking.

You would check a new address really had funds by asking the network for the branches concerning just that address.

Even if the majority of nodes withheld information you would only need one honest node to get the complete picture in a short download.

How would that compare to maintaining a merkle tree with an enforced sort order: sorted by address?  Then the odds of accepting an answer from a lying node would be zero.  No need to seek a consensus.

My solution did not seek consensus, as I said once you have the branches with transactions linked to the main hash you KNOW they are true.
Even if 90% of the network is withholding the "spent it all" transaction you would easily be able to get it from just ONE honest node.

How is the alternate merkle tree even safe with no/little mining? I could make a false log, sign it with minimal mining or put it in the blockchain (both easy) and fool you all right?

Quote
If there were an enforced sort order, a response to a query could reliably prove he returned the entire set of records matching the query (or prove none existed), simply by providing a contiguous chunk of leaf nodes that completely contains the query, along with everything up to the merkle root.  Sort of how I could prove "Gordon Dinklebutt" is or isn't in the phone book by providing copies of contiguously numbered pages that include both "Dickson" (which is less than Dinklebutt) and "Dipweed" (which is greater), as well as everything in between.
That only works if the phone book is complete - what if you are not sent the most up to date altchain/log-block? What if I send you false ones?

My solution had further advantages as NO ONE would need the full block - a big advantage as miners can't run lite nodes as it is.

My solution also solved propagation time problems blocks have today - this solution would not as it relies on the normal blockchain.

All of this without forking or merged mining.

Cheap and sexy Bitcoin card/hardware wallet, buy here:
http://BlochsTech.com
Realpra
Hero Member
*****
Offline Offline

Activity: 816
Merit: 1000


View Profile
June 19, 2012, 07:07:28 AM
 #52

If the idea of having a meta tree gained a following, then I'd almost like to throw out the following on top of it:  have two meta trees.
My solution already has this in it in by looking at branches.

Each node would be able to choose the depth of the merkle tree at which it wanted to verify.

All nodes could choose different points.

Cheap and sexy Bitcoin card/hardware wallet, buy here:
http://BlochsTech.com
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1073


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
June 19, 2012, 07:18:29 AM
 #53

My solution did not seek consensus, as I said once you have the branches with transactions linked to the main hash you KNOW they are true.
Even if 90% of the network is withholding the "spent it all" transaction you would easily be able to get it from just ONE honest node.

The problem is that you still have no certain way to know if a transaction is being withheld.  An attacker controlling your upstream connectivity (an attack that happens all the time in the real world) can easily engineer your view of the network such that you only see the nodes of the attacker's choice.  I am not sure many people will view that as acceptable.  This is very important, because if someone can withhold from you the knowledge that an incoming transaction is invalid because it's a double-spend, then you're vulnerable to double-spending attacks.

How is the alternate merkle tree even safe with no/little mining? I could make a false log, sign it with minimal mining or put it in the blockchain (both easy) and fool you all right?

By imposing a requirement that the merkle root be in the main chain's coinbase.  That essentially makes it "mandatory merged mining".

Such an idea might start out as a novelty, where the tree is maintained voluntarily and can't really be relied upon, but results in a huge improvement when you choose to rely upon it, albeit at a risk.  The developers may then say, "That improvement is great, let's eliminate the risk by making it mandatory to provide the correct merkle root of the meta tree(s) rather than optional, as a condition of a block to be accepted by the network."

That only works if the phone book is complete - what if you are not sent the most up to date altchain/log-block? What if I send you false ones?

A false meta tree would be outed by its root not matching what's in the block headers of the main chain.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
Maged
Legendary
*
Offline Offline

Activity: 1204
Merit: 1006


View Profile
June 19, 2012, 07:30:16 AM
 #54

An individual user client would likely use the tree 6 meta-blocks or more back to ensure that they are getting an accurate picture of things. Merchants would likely go back 100-1000 meta-blocks, and new miners would go back at least 10,000 meta-blocks,

As long as at least one person forever stores the entire blockchain (but maybe even not, if people trust that the meta chain was accurate for the past several years), those limits should provide plenty of warning and safety in case the meta chain gets 51% attacked.

casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1073


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
June 19, 2012, 07:36:57 AM
 #55

...and new miners would go back at least 10,000 meta-blocks...

Even if new miners chose not to do this, and successfully get tricked with a bogus view of the meta tree, their worst case is that they produce invalid blocks that become farts in the wind.  But I suppose mining pools would have a responsibility to do this.

As long as at least one person forever stores the entire blockchain (but maybe even not, if people trust that the meta chain was accurate for the past several years), those limits should provide plenty of warning and safety in case the meta chain gets 51% attacked.

And if the meta chain could be made an integral part of Bitcoin to the point that mining it was mandatory to mine Bitcoin, then the only way to 51% attack the meta chain would be to successfully 51% attack Bitcoin, which I would find comforting.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
Realpra
Hero Member
*****
Offline Offline

Activity: 816
Merit: 1000


View Profile
June 19, 2012, 07:37:05 AM
 #56

My solution did not seek consensus, as I said once you have the branches with transactions linked to the main hash you KNOW they are true.
Even if 90% of the network is withholding the "spent it all" transaction you would easily be able to get it from just ONE honest node.

The problem is that you still have no certain way to know if a transaction is being withheld.  An attacker controlling your upstream connectivity (an attack that happens all the time in the real world) can easily engineer your view of the network such that you only see the nodes of the attacker's choice.
He can't control you if you choose to connect to say the mtgox node or some other trusted* node via public key encryption.

The attacker would be unable to understand such secure messages - provided you have a known good public key to write to.

(* Not really trusted, just ANYONE who doesn't realize what txs the attacker want withheld.)

Quote
By imposing a requirement that the merkle root be in the main chain's coinbase.  That essentially makes it "mandatory merged mining".

So that is basically a fork right? My solution doesn't have that.

Anyway what if I am a miner and I include a signature of a incomplete log in my base?

The ONLY way to tell my log was incomplete would be to download the entire chain.

Quote
Such an idea might start out as a novelty
A swarm client would be instantly useful and has similar/same programmatic complexity as this.

Heck mining pools might be richly rewarded by adopting swarm clients. (by being able to have larger pools/more txs/fees in a block)

Quote
A false meta tree would be outed by its root not matching what's in the block headers of the main chain.
I was lucky enough to put it there as a miner, so the signature has been merged mined, all inside txs are valid - I just didn't include the one where I moved a bunch of coins.

Only way to oust that is checking the entire chain yourself.

I think with every miner motivated to do the above attack my solution is safer.

Cheap and sexy Bitcoin card/hardware wallet, buy here:
http://BlochsTech.com
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1073


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
June 19, 2012, 07:46:39 AM
 #57

He can't control you if you choose to connect to say the mtgox node or some other trusted node via public key encryption.

The attacker would be unable to understand such secure messages - provided you have a known good public key to write to.

I don't see that happening any time soon, as it would be opposite Bitcoin's design goals of decentralization.

So that is basically a fork right? My solution doesn't have that.

Right, instead your well-intended solution has a double spend vulnerability easily exploited by any upstream provider that can only be mitigated by connecting to a known centralized server that you think you can "trust" (the opposite of peer-to-peer).

Anyway what if I am a miner and I include a signature of a incomplete log in my base?

The ONLY way to tell my log was incomplete would be to download the entire chain.

Yep, you are right: so long as the meta chain were experimental and non-mandatory, anybody could throw anything they want in the coin base, including a completely falsified meta merkle root.  But consumers of the meta chain would depend on nothing that didn't have 6 confirmations (meta chain confirmations in the bitcoin block chain, not just 6 bitcoin blocks).  Your hash power would have to exceed that of those putting honest logs, essentially you would be attempting to attacking the meta chain and would need 51% of the meta chain's hash power to succeed.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
HostFat
Staff
Legendary
*
Offline Offline

Activity: 3794
Merit: 1152


I support freedom of choice


View Profile
June 19, 2012, 07:51:25 AM
 #58

I'm don't understand completely all your proposals, but I stick with the one that works in all cases with this rule: "trust no one".

NON DO ASSISTENZA PRIVATA
Realpra
Hero Member
*****
Offline Offline

Activity: 816
Merit: 1000


View Profile
June 19, 2012, 07:58:33 AM
 #59

He can't control you if you choose to connect to say the mtgox node or some other trusted node via public key encryption.

The attacker would be unable to understand such secure messages - provided you have a known good public key to write to.

I don't see that happening any time soon, as it would be opposite Bitcoin's design goals of decentralization.
I think you misunderstand, you don't need to connect to a TRUSTED node per say just ANY node that is not colluding with the attacker.

ANY will do. Could even be a different ATTACKER that didn't know what the FIRST attacker wanted hidden!

As for secure communication that is pretty standard, BTC should have it already if it doesn't.

Quote
So that is basically a fork right? My solution doesn't have that.

Right, instead your well-intended solution has a double spend vulnerability easily exploited by any upstream provider that can only be mitigated by connecting to a known centralized server that you think you can "trust" (the opposite of peer-to-peer).
Read above.

Quote
But consumers of the meta chain would depend on nothing that didn't have 6 confirmations (meta chain confirmations in the bitcoin block chain, not just 6 bitcoin blocks)
.
That would be DAYS in confirmation time in the beginning, who would use that to any great extent?

Quote
Your hash power would have to exceed that of those putting honest logs, essentially you would be attempting to attacking the meta chain and would need 51% of the meta chain's hash power to succeed.
You would be relying on SOMEONE checking that all those 6 logs are complete and then what? REPORTING it if not? Dumping the entire chain?
What miner would do that for an alt chain log?

As for reporting, yep you just arrived at part 1 of my solution, welcome.

Cheap and sexy Bitcoin card/hardware wallet, buy here:
http://BlochsTech.com
Realpra
Hero Member
*****
Offline Offline

Activity: 816
Merit: 1000


View Profile
June 19, 2012, 08:01:30 AM
 #60

I'm don't understand completely all your proposals, but I stick with the one that works in all cases with this rule: "trust no one".
My solution is basically trust that 1 guy out 1000 is honest - or run the client you are today with massive lag/huge fees.

Cheap and sexy Bitcoin card/hardware wallet, buy here:
http://BlochsTech.com
Maged
Legendary
*
Offline Offline

Activity: 1204
Merit: 1006


View Profile
June 19, 2012, 08:02:03 AM
 #61

...and new miners would go back at least 10,000 meta-blocks...

Even if new miners chose not to do this, and successfully get tricked with a bogus view of the meta tree, their worst case is that they produce invalid blocks that become farts in the wind.  But I suppose mining pools would have a responsibility to do this.
I would argue that as long as new miners that are bootstrapping at any given time are only a small % of the hash power, they'd be stupid not to verify that far back. Any misplaced trust in the recent meta-blocks could cause them to create an invalid block, which would be a terrible finical loss. In fact, for this reason, many miners will likely opt to always hold the entire chain, and not trust the meta-blocks at all. I consider that a good thing.

Generally, only users and merchants should be using the meta-chain to bootstrap, although I won't be that disappointed if miners eventually have to use it too, as long as they're careful.

casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1073


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
June 19, 2012, 08:06:52 AM
 #62

@realpra How will you connect to any node not controlled by an attacker if I the attacker control your upstream Internet and am redirecting your connection attempts to nodes I control?  You think you are connected to node X by its IP, but you are really connected to my node Y and have no way to know.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
Maged
Legendary
*
Offline Offline

Activity: 1204
Merit: 1006


View Profile
June 19, 2012, 08:08:04 AM
 #63

Quote
But consumers of the meta chain would depend on nothing that didn't have 6 confirmations (meta chain confirmations in the bitcoin block chain, not just 6 bitcoin blocks)
.
That would be DAYS in confirmation time in the beginning, who would use that to any great extent?
Many people. Imagine if all you had to do today to bootstrap was to download a week of blocks. A low-end laptop can do that in less than an hour today, and that's without all of the code optimizations the Bitcoin implementations will have in the future, not to mention hardware.

Realpra
Hero Member
*****
Offline Offline

Activity: 816
Merit: 1000


View Profile
June 19, 2012, 08:11:20 AM
Last edit: June 19, 2012, 08:26:14 AM by Realpra
 #64

@realpra How will you connect to any node not controlled by an attacker if I the attacker control your upstream Internet and am redirecting your connection attempts to nodes I control?  You think you are connected to node X by its IP, but you are really connected to my node Y and have no way to know.
Its called SSL I think.

Say I store the public keys for:
1. My friend Bob.
2. Guy who posted his key on a forum.
3. MtGox.

Since I am lazy that's it.

You send me invalid money and I check those nodes.

It now either becomes apparent that someone is blocking my connection OR one of them will likely NOT be colluding with you.

You can fake IPs but you have no way to fake that you have their private keys for my encrypted communication so my client will just display "You are under attack!!!".

Many people. Imagine if all you had to do today to bootstrap was to download a week of blocks. A low-end laptop can do that in less than an hour today, and that's without all of the code optimizations the Bitcoin implementations will have in the future, not to mention hardware.
I mean what customer/merchant would wait days to know whether payment was made or not?

edit: A swarm client would run on a smartphone and act as a full node btw.. Could even mine a bit in a pool.

Cheap and sexy Bitcoin card/hardware wallet, buy here:
http://BlochsTech.com
DiThi
Full Member
***
Offline Offline

Activity: 156
Merit: 100

Firstbits: 1dithi


View Profile
June 19, 2012, 03:26:32 PM
Last edit: June 19, 2012, 03:42:25 PM by DiThi
 #65

You are discussing two issues that IMHO it's already resolved in my proposal or a followup:

Efficient tree update: The update functions only recalculates the hashes affected by the changes of each block. Those changes can be reversed, as long as the block is valid (i.e. there's no double spends), therefore it will be easy to roll-back in case of getting orphaned blocks.

Where to save the roots: In my proposal I explain how to roll it out in the coinbase of the existing chain, but nullifying the risk of a chain split by rejecting blocks with invalid root only when there are more than 55% of valid roots in a specific time span.

For extra security (and this is what isn't originally in my proposal), the root should be accompanied by a hash of the previous+current valid roots, effectively making a secure blockchain from day one. But after it's deployed widely, it will be unnecessary, as we'll know miners will reject blocks with invalid roots. Miners won't reject blocks without roots. Blocks with valid root but without this blockchain-ish hash won't be rejected either (so we can drop this hash when it's not longer necessary).

In this way, creating a separate chain is just a temporal fix for something that will be in the main chain some day.

1DiThiTXZpNmmoGF2dTfSku3EWGsWHCjwt
galambo
Sr. Member
****
Offline Offline

Activity: 672
Merit: 252


View Profile
June 19, 2012, 03:44:42 PM
 #66

This idea could end up having more uses than enabling lightweight clients.

For instance, forking the main blockchain is practically impossible today. Even if someone came around with worthwhile changes to the storage subsystem or the scripting subsystem we could never implement it. The moment the two chains got out of sync you need two copies of the block chain .dat that are mostly identical.

With this proposal a certain "snapshot" in the metachain could be specified as the branch point for the blockchain. This snapshot could be used to refer back to the legacy system.

The proposal would allow experiments and tests using the real chain. The developers have been sort of paralyzed because they cannot change many things in the implementation because there's not really any way to change it.

If one of these experimental branches became popular enough, a new branch could be created on the official branch with ample notice to all users.

Also, having a chain of snapshots would allow the network to avoid new and unforeseen attacks. If one user managed to do something detrimental in the block chain to his advantage and every other user's disadvantage (like a sustained 51% attack, or an exploit), the community could achieve a consensus to "go back in time" to a previous snapshot with a patched client.
DiThi
Full Member
***
Offline Offline

Activity: 156
Merit: 100

Firstbits: 1dithi


View Profile
June 19, 2012, 04:03:50 PM
 #67

About rolling out new features and avoiding block chain splits, what we need is a good automatic system to automatically and democratically add any feature. Just like the implementation schedule of p2sh but being more like my proposal: time-flexible, with an additional temporal sub-chain, and for any feature. It may be difficult and problematic to code it only for one feature, but IMHO it's worth it if it's a generic implementation-deprecation system for determining the validity of blocks.

1DiThiTXZpNmmoGF2dTfSku3EWGsWHCjwt
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1073


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
June 19, 2012, 04:17:30 PM
 #68

Its called SSL I think.

I would be pretty surprised if nodes started identifying themselves through SSL certificates.

That said however, what it looks like you have proposed is tiers of nodes and a structure that includes supernodes.  I actually agree with you that such a structure will be critical to scalability of the network.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
etotheipi
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1048


Core Armory Developer


View Profile WWW
June 19, 2012, 04:20:01 PM
 #69

You are discussing two issues that IMHO it's already resolved in my proposal or a followup:

Efficient tree update: The update functions only recalculates the hashes affected by the changes of each block. Those changes can be reversed, as long as the block is valid (i.e. there's no double spends), therefore it will be easy to roll-back in case of getting orphaned blocks.

Where to save the roots: In my proposal I explain how to roll it out in the coinbase of the existing chain, but nullifying the risk of a chain split by rejecting blocks with invalid root only when there are more than 55% of valid roots in a specific time span.

For extra security (and this is what isn't originally in my proposal), the root should be accompanied by a hash of the previous+current valid roots, effectively making a secure blockchain from day one. But after it's deployed widely, it will be unnecessary, as we'll know miners will reject blocks with invalid roots. Miners won't reject blocks without roots. Blocks with valid root but without this blockchain-ish hash won't be rejected either (so we can drop this hash when it's not longer necessary).

In this way, creating a separate chain is just a temporal fix for something that will be in the main chain some day.

DiThi,

I see this from a different angle.  

(1) The tree-part of my proposal should be seen as an extension of yours.  I'm sure my idea was inspired from reading yours a long time ago.  The difference being that extra complexity is added to the tree structure to accommodate the most common use-case:  requesting address balances.  My tree structure guarantees that you can not only get any TxOut, but you can get all TxOuts for a given address/script and have no doubts that it's correct.

I believe this is a worthy trade-off, comared to your tree structure, as it removes a channel of uncertainty for the operator, and removes a channel for shenanigans from those who wish to deceive you.  And in the end, it's not actually that much more complicated.  It's simply more-tailored for the way that users need to access the network.

(2)  As echoed by others, I believe that a hard-forking blockchain change is going to only happen in the event of a crisis.  To do so requires more than democracy -- it will seriously impact the entire network in a detrimental way.  There are users who are still using version 0.3.X bitcoin clients not because they want to, but because it works, and they don't follow the forums or Bitcoin news or anything of the sort.  And the hard fork exposes them to all sorts of malicious behavior by others who would exploit their ignorance of current events and manipulate the abandoned chain that they are stuck on.

To maintain confidence in the system, a hard fork is going to need more than democracy -- it's going to need super-majority, probably 80-90% ... and gaining that level of consensus is pretty much impossible for new ideas that are not well-understood -- unless the idea has been in the wild, and in use for many months/years and is already used by 80%+ people.

The idea of using a second blockchain is actually a way of creating a "staging area" for such ideas on the main network (like galambo said) without actually risking exposing that network to any of the unforeseen issues that could arise.  It can be used to add such functionality to the network without actually changing the network.

In this way, the meta-chain can grow and develop as people start using it and understanding it.  People start building infrastructure on the availability of the information in that chain.  Once it has become ubiquitous enough and time-tested as a pillar of a part of the network, then you have 80%+ agreement amongst users without even having to ask for it.  At this point, a hard-fork is entirely feasible -- or at least orders of magnitude less disruptive.

You're right, it's not the only way, but I think it's about as good as it's going to get.

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
DiThi
Full Member
***
Offline Offline

Activity: 156
Merit: 100

Firstbits: 1dithi


View Profile
June 19, 2012, 05:17:47 PM
 #70

My tree structure guarantees that you can not only get any TxOut, but you can get all TxOuts for a given address/script and have no doubts that it's correct.

I always saw that a separate issue/feature of my proposal (i.e. not necessary for starting and deploying an implementation), also making things simpler. Sometimes (actually most of the time) you just need to know an output hasn't been spent. If you need the balance and someone gives you a list of outputs, you can be sure those outputs are unspent; the only thing remaining is knowing if *all* the outputs are given to you.

That's easy to solve. I'm thinking of several solutions that doesn't require full nodes to build and verify the tree. For example having a separate tree, address-based instead of chain-based, which just stores the number of unspent outputs (removing the key if the value is 0).

Initially, though, we can just query several nodes to give us the count of unspent outputs and trust the majority.

1DiThiTXZpNmmoGF2dTfSku3EWGsWHCjwt
etotheipi
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1048


Core Armory Developer


View Profile WWW
June 19, 2012, 05:26:32 PM
 #71

My tree structure guarantees that you can not only get any TxOut, but you can get all TxOuts for a given address/script and have no doubts that it's correct.

I always saw that a separate issue/feature of my proposal (i.e. not necessary for starting and deploying an implementation), also making things simpler. Sometimes (actually most of the time) you just need to know an output hasn't been spent. If you need the balance and someone gives you a list of outputs, you can be sure those outputs are unspent; the only thing remaining is knowing if *all* the outputs are given to you.

That's easy to solve. I'm thinking of several solutions that doesn't require full nodes to build and verify the tree. For example having a separate tree, address-based instead of chain-based, which just store the number of unspent outputs (removing the key if the value is 0).

Initially, though, we can just query several nodes to give us the count of unspent outputs and trust the majority.

Well that's where we are differing in opinion.  Majority peer-influence is cheap relative to majority mining power.  That's not to say it's an easy exploit, or that it would be in any way worth it.  But I see it as a source of uncertainty, and a channel waiting to be exploited in some way we haven't thought about.  I think the added complexity is well worth closing the "hole" completely.  Though not everyone feels it's actually a hole. 

I personally think it makes more sense, anyway -- you can still get a single TxOut with O(log(N)+log(M)) if you really want it -- but most of the time, it would be new nodes hopping on the network with imported wallets, and simply want to get their balance.  This tree structure takes that use case into account directly and doesn't leave a shred of uncertainty that they got the right answer.




Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
Realpra
Hero Member
*****
Offline Offline

Activity: 816
Merit: 1000


View Profile
June 19, 2012, 06:15:12 PM
 #72

Its called SSL I think.

I would be pretty surprised if nodes started identifying themselves through SSL certificates.
I would also be surprised if someones upstream connection was stolen Wink

At least with SSL known, it would only happen once before an SSL update was made.

Quote
That said however, what it looks like you have proposed is tiers of nodes and a structure that includes supernodes.  I actually agree with you that such a structure will be critical to scalability of the network.
No my structure theoretically could operate entirely with swarm clients.

However in the case of mining pools you might have one node orchestrating which hash will be worked on.

The guy of this thread has super nodes, I don't. Maybe you got confused that way.

I don't like super nodes, I think it's bad centralized design.

Quote
DiThi
Could you give me a link to your proposal?

Cheap and sexy Bitcoin card/hardware wallet, buy here:
http://BlochsTech.com
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1073


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
June 19, 2012, 07:01:46 PM
 #73

I would also be surprised if someones upstream connection was stolen Wink

Perhaps you are unfamiliar with how the Internet works in places like Iran and China, where not only do they do MITM attacks on their citizens, they coerce SSL certificate providers to issue them bogus certificates so their citizens will be caught unaware.

Bitcoin needs to work there, too.

http://www.bgr.com/2011/08/30/iranian-government-said-to-be-using-mitm-hack-to-spy-on-gmail-other-google-services/

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
unclescrooge
aka Raphy
Hero Member
*****
Offline Offline

Activity: 868
Merit: 1000


View Profile
June 19, 2012, 07:56:05 PM
 #74

About rolling out new features and avoiding block chain splits, what we need is a good automatic system to automatically and democratically add any feature. Just like the implementation schedule of p2sh but being more like my proposal: time-flexible, with an additional temporal sub-chain, and for any feature. It may be difficult and problematic to code it only for one feature, but IMHO it's worth it if it's a generic implementation-deprecation system for determining the validity of blocks.

Maybe I'm misunderstanding the change your talking about. But I think this is dangerous.I use bitcoin because i trust the protocol behind to never change. If the majority or the devs can push a change in protocol, then I'm out. A way to compress the blockchain fine. A fork, hard or soft... mmmmm seems dangerous to me.
galambo
Sr. Member
****
Offline Offline

Activity: 672
Merit: 252


View Profile
June 19, 2012, 10:02:10 PM
 #75


(2)  As echoed by others, I believe that a hard-forking blockchain change is going to only happen in the event of a crisis.  To do so requires more than democracy -- it will seriously impact the entire network in a detrimental way.  There are users who are still using version 0.3.X bitcoin clients not because they want to, but because it works, and they don't follow the forums or Bitcoin news or anything of the sort.  And the hard fork exposes them to all sorts of malicious behavior by others who would exploit their ignorance of current events and manipulate the abandoned chain that they are stuck on.

To maintain confidence in the system, a hard fork is going to need more than democracy -- it's going to need super-majority, probably 80-90% ... and gaining that level of consensus is pretty much impossible for new ideas that are not well-understood -- unless the idea has been in the wild, and in use for many months/years and is already used by 80%+ people.

The idea of using a second blockchain is actually a way of creating a "staging area" for such ideas on the main network (like galambo said) without actually risking exposing that network to any of the unforeseen issues that could arise.  It can be used to add such functionality to the network without actually changing the network.

In this way, the meta-chain can grow and develop as people start using it and understanding it.  People start building infrastructure on the availability of the information in that chain.  Once it has become ubiquitous enough and time-tested as a pillar of a part of the network, then you have 80%+ agreement amongst users without even having to ask for it.  At this point, a hard-fork is entirely feasible -- or at least orders of magnitude less disruptive.

You're right, it's not the only way, but I think it's about as good as it's going to get.

Thank you for your feedback. I wasn't quite sure if people would agree that this could help automate the BIP process.


Maybe I'm misunderstanding the change your talking about. But I think this is dangerous.I use bitcoin because i trust the protocol behind to never change. If the majority or the devs can push a change in protocol, then I'm out. A way to compress the blockchain fine. A fork, hard or soft... mmmmm seems dangerous to me.

The only way you would notice this kind of fork is if you applied the experimental patch to your Bitcoin client. Think about it like another "test net."
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1073


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
June 19, 2012, 10:26:00 PM
 #76

I actually disagree that a hard fork would be required to implement this.  A simple majority of mining power would be enough.  New blocks meeting the new requirements would still be valid blocks to the old clients, the only change being that the majority of miners would work to orphan blocks not containing the proper meta tree root, so miners mining with an old client would have an impossible time getting any blocks.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
cantor
Newbie
*
Offline Offline

Activity: 31
Merit: 0



View Profile
June 20, 2012, 01:46:43 AM
 #77

So as I understand it, the issue here is to build a meta-chain that would contain digests of the contents of the main blockchain, in a way that would allow a lite-client to query a server for information stored into the blockchain, and use the meta-chain to verify the answer.  And even the meta-chain itself would be constructed in such a way that it can be partially queried and verified using only its root hash, meaning the lite-client would only need a bound amount of storage.

I hope I'm doing a good summary of what is being discussed in this thread, considering that it's pretty late at night here Tongue  At any rate, subscribing, this looks really interesting.
etotheipi
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1048


Core Armory Developer


View Profile WWW
June 20, 2012, 02:04:14 AM
Last edit: June 20, 2012, 02:26:27 AM by etotheipi
 #78

So as I understand it, the issue here is to build a meta-chain that would contain digests of the contents of the main blockchain, in a way that would allow a lite-client to query a server for information stored into the blockchain, and use the meta-chain to verify the answer.  And even the meta-chain itself would be constructed in such a way that it can be partially queried and verified using only its root hash, meaning the lite-client would only need a bound amount of storage.

I hope I'm doing a good summary of what is being discussed in this thread, considering that it's pretty late at night here Tongue  At any rate, subscribing, this looks really interesting.

Yeah, fairly accurate.  I'll re-summarize here because my view of my own proposal has evolved over discussions of the last few days, so I figured it was a good time to restate it, anyway Smiley


The first goal is blockchain pruning:  the amount of information needed to store the entire state of the network at any point in time is much less than the entire blockchain history.  You can basically just save the "outer surface" of the transaction map instead of the entire history and still do full validation.  

So I propose a structure that achieves this compression, and further organizes it accommodate a specific, common problem we want to solve anyway:  a new node gets on the network with its imported wallet and doesn't need the whole chain, but would like to get a complete history of it's own transactions in a verifiable manner.  I argue that with a more-straightforward "snapshot" tree, there's still room for deception by malicious peers, albeit not a whole lot.

Either way, I believe it's possible that new nodes can use a structure like this one to get up in running with full confidence in less than 50 MB of downloading, even in the far future.  And such a solution will be necessary, so let's hash it out now...

However, for lite-nodes to reliably use the new information, there must be some kind of enforcement that miners provide correct answers when updating the root node.  This could be done by hard-forking the network by changing the headers to require a valid root node, soft-forking by requiring a specific tx or coinbase script to contain a valid root, or as I propose: create a separate chain solely to allow mining power to "vote" on the correct snapshot root.  Such a solution would then be completely optional and transparent to anyone who doesn't know or care about it -- though I would expect most miners and developers would be anxious to leverage it.

As galambo brought up -- the alt-/meta-chain idea is a kind of "staging area" for this new piece of the protocol.  Once the community starts using it and becomes dependent on the information it provides, it could be integrated into the main chain (via hard- or soft-forking) as it would have super-majority support at that point.





Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
2112
Legendary
*
Offline Offline

Activity: 2128
Merit: 1042



View Profile
June 20, 2012, 02:12:34 AM
 #79

So as I understand it, the issue here is to build a meta-chain that would contain digests of the contents of the main blockchain,
This isn't a good summary.

The issues are:

1) saving local storage space by purging spent transaction info, but at the same time maintaining cryptographic verifiability of the stored info.

2) augmenting the p2p protocol such that in order to participate in network client doesn't have to start from the genesis all the way until now, but start at now (or close past) and go back in time only to the oldest coin dealt in the transaction, not all the way back to the genesis.

3) relaxing the original peer-to-peer protocol to allow at least partial parasite-to-peer operation, where parasite is a pretend-peer that doesn't fully verify the relayed information but just repeats the latest rumors really fast. The goal is to limit the possible damage caused by such network rumormongers.

My description probably isn't clearer but it is closer to the truth.

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0
doobadoo
Sr. Member
****
Offline Offline

Activity: 364
Merit: 250


View Profile
June 20, 2012, 02:20:00 AM
 #80

Can compression be used as an intermediate step along the way.  That is, is the blockchain stored on the disk in an efficient manner?  Also, why do noobs have to dl the whole block chain from peers?  Its soooo slow.  Couldn't each release come along with a zipped copy of the blockchain up to the date it was released, along with a hard-coded hash of that block chain up till then with a built in check.  That way the user can just dl the whole shebang in one swoop.

These of course are not permanent measures but maybe as interim fixes for now.

"It is, quite honestly, the biggest challenge to central banking since Andrew Jackson." -evoorhees
tevirk
Newbie
*
Offline Offline

Activity: 15
Merit: 0



View Profile
June 20, 2012, 05:23:04 AM
 #81

Because most of the data in the block chain is hashes, it's not compressible at all.
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1006


View Profile
June 20, 2012, 07:38:46 AM
 #82

Can compression be used as an intermediate step along the way.  That is, is the blockchain stored on the disk in an efficient manner?  Also, why do noobs have to dl the whole block chain from peers?  Its soooo slow.  Couldn't each release come along with a zipped copy of the blockchain up to the date it was released, along with a hard-coded hash of that block chain up till then with a built in check.  That way the user can just dl the whole shebang in one swoop.

These of course are not permanent measures but maybe as interim fixes for now.
Most of the sync time is spent doing ECDSA verification and disk I/O. Compression would actually make the problem worse. Packaging the block chain up to the latest checkpoint isn't a bad idea, but won't improve the situation as much as you probably think. The client would still have to verify the block chain, which means hours on end of 100% CPU utilization.

The real solution is to develop a way to avoid verification of historical data without compromising the security that verification provides. That's what this thread is about.

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
jojkaart
Member
**
Offline Offline

Activity: 97
Merit: 10


View Profile
June 20, 2012, 05:49:03 PM
 #83

About rolling out new features and avoiding block chain splits, what we need is a good automatic system to automatically and democratically add any feature. Just like the implementation schedule of p2sh but being more like my proposal: time-flexible, with an additional temporal sub-chain, and for any feature. It may be difficult and problematic to code it only for one feature, but IMHO it's worth it if it's a generic implementation-deprecation system for determining the validity of blocks.

Maybe I'm misunderstanding the change your talking about. But I think this is dangerous.I use bitcoin because i trust the protocol behind to never change. If the majority or the devs can push a change in protocol, then I'm out. A way to compress the blockchain fine. A fork, hard or soft... mmmmm seems dangerous to me.

There is at least one guaranteed hard fork that is going to happen eventually. Blocks are currently limited to maximum of 1 megabyte in size. This could start hampering further growth of Bitcoin usage starting sometime next year.

However, this doesn't mean Bitcoin devs can just push any change they want. Users and miners can always opt to not use whatever new version they put out.

In short, getting any change out needs a massive majority support from Bitcoin users to happen.
unclescrooge
aka Raphy
Hero Member
*****
Offline Offline

Activity: 868
Merit: 1000


View Profile
June 20, 2012, 06:06:14 PM
 #84

About rolling out new features and avoiding block chain splits, what we need is a good automatic system to automatically and democratically add any feature. Just like the implementation schedule of p2sh but being more like my proposal: time-flexible, with an additional temporal sub-chain, and for any feature. It may be difficult and problematic to code it only for one feature, but IMHO it's worth it if it's a generic implementation-deprecation system for determining the validity of blocks.

Maybe I'm misunderstanding the change your talking about. But I think this is dangerous.I use bitcoin because i trust the protocol behind to never change. If the majority or the devs can push a change in protocol, then I'm out. A way to compress the blockchain fine. A fork, hard or soft... mmmmm seems dangerous to me.

There is at least one guaranteed hard fork that is going to happen eventually. Blocks are currently limited to maximum of 1 megabyte in size. This could start hampering further growth of Bitcoin usage starting sometime next year.

However, this doesn't mean Bitcoin devs can just push any change they want. Users and miners can always opt to not use whatever new version they put out.

In short, getting any change out needs a massive majority support from Bitcoin users to happen.

Do we need a fork for block size? (sorry I don't know a bit about this)
jojkaart
Member
**
Offline Offline

Activity: 97
Merit: 10


View Profile
June 20, 2012, 07:06:04 PM
 #85

Do we need a fork for block size? (sorry I don't know a bit about this)

Yes,  it will be a fork because all current nodes on the network will ignore any block bigger than 1 megabyte. The best way to actually do this is to set a date (well, a block number in practise), say a year or two in the future and release a version that will keep rejecting blocks bigger than one megabyte until that block number is reached. After that it will accept blocks bigger than one megabyte.

Then, when a block that is bigger than one megabyte is created, the nodes with client versions after the update was made will accept the block and all the older versions will reject it. In practise, there's unlikely to be a real fork unless a significant portion of users refuse to upgrade.

- Joel
Mageant
Legendary
*
Offline Offline

Activity: 1145
Merit: 1001



View Profile WWW
June 20, 2012, 08:10:15 PM
Last edit: June 20, 2012, 08:30:57 PM by Mageant
 #86

You wouldn't have to update the meta-chain with every individual new block that comes out of the regular blockchain.

The lightweight client that uses the meta-chain could still store a small portion of the regular blockchain, something like the latest 100 blocks (max. 100 MB), alternatively this could be a variable of storage space.

Only updating the meta-chain every 100 blocks/100 MB or so would IMHO reduce the load on the meta-chain and the lightweight clients using it.

This also avoids any synchronization problems with the latest blocks out of the blockchain and the possibility of small forks (orphaned blocks) in the blockchain, since the meta-chain then would only synchronize with blocks of a highly confirmed number.

cjgames.com
jojkaart
Member
**
Offline Offline

Activity: 97
Merit: 10


View Profile
June 20, 2012, 08:33:24 PM
 #87

You wouldn't have to update the meta-chain with every individual new block that comes out of the regular blockchain.

The lightweight client that uses the meta-chain could still store a small portion of the regular blockchain, something like the latest 100 blocks (max. 100 MB), alternatively this could be a variable of storage space.

Only updating the meta-chain every 100 blocks/100 MB or so would IMHO reduce the load on the meta-chain and the lightweight clients using it.

This also avoids any sychronization problems with the latest blocks out of the blockchain and the possibility of small forks (orphaned blocks) in the blockchain, since the meta-chain then would only sychronize with blocks of a highly confirmed number.

I don't see how it would reduce the load any to only update the meta-chain every 100 blocks. It'd just concentrate the update load on certain blocks. The planned tree structure for this would allow O(M*log N) updates, where M is number of updated transactions and N is the total number of transactions in the tree.
Mageant
Legendary
*
Offline Offline

Activity: 1145
Merit: 1001



View Profile WWW
June 20, 2012, 08:48:37 PM
 #88

You wouldn't have to update the meta-chain with every individual new block that comes out of the regular blockchain.

The lightweight client that uses the meta-chain could still store a small portion of the regular blockchain, something like the latest 100 blocks (max. 100 MB), alternatively this could be a variable of storage space.

Only updating the meta-chain every 100 blocks/100 MB or so would IMHO reduce the load on the meta-chain and the lightweight clients using it.

This also avoids any sychronization problems with the latest blocks out of the blockchain and the possibility of small forks (orphaned blocks) in the blockchain, since the meta-chain then would only sychronize with blocks of a highly confirmed number.

I don't see how it would reduce the load any to only update the meta-chain every 100 blocks. It'd just concentrate the update load on certain blocks. The planned tree structure for this would allow O(M*log N) updates, where M is number of updated transactions and N is the total number of transactions in the tree.

Because certain outputs could have been already respent within the last 100 blocks, so you could cut out those.

Also there is the issue of avoiding blockchain forks that get orphaned, so the lightweight client would store a small portion of the regular blockchain anyway, say the last X blocks, where X is the number of blocks where you can be relatively sure that they won't get orphaned. I don't know if avoiding orphaned blocks is very important though, maybe it's not.

cjgames.com
apetersson
Hero Member
*****
Offline Offline

Activity: 668
Merit: 501



View Profile
June 20, 2012, 09:43:48 PM
 #89

your nice graphs hosted on dropbox are no longer working. maybe upload them to imgur.com ?
Realpra
Hero Member
*****
Offline Offline

Activity: 816
Merit: 1000


View Profile
June 20, 2012, 10:44:43 PM
 #90

The lightweight client that uses the meta-chain could still store a small portion of the regular blockchain, something like the latest 100 blocks (max. 100 MB), alternatively this could be a variable of storage space.

Only updating the meta-chain every 100 blocks/100 MB or so would IMHO reduce the load on the meta-chain and the lightweight clients using it.
Seems like a another flaw in this design to me.

At VISA volumes I think each block would be one gigabyte. Even if half or less it would break the light nodes proposed here.

This solution is patch work, a swarm client combined with the ledger system is the only way. It doesn't have to be my design, but we will benefit a lot from swarm principles at some point.

Cheap and sexy Bitcoin card/hardware wallet, buy here:
http://BlochsTech.com
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1073


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
June 20, 2012, 11:11:56 PM
 #91

I am the one who originally suggested the 100 block interval... but I don't think I said that updating the meta tree only every 100 blocks is what should be done.

Rather, the meta tree should be rebalanced every 100 blocks, and between then, nodes should be added and deleted using methodology that avoids (procrastinates) having to recalculate the hashes for most or all the nodes in the tree any time there was a change.  Otherwise, every incoming transaction will have a huge CPU time burden that's not sustainable.  Rebalancing the tree is much like rebuilding a database index.

The reason why 100 blocks, is that there needs to be an agreement that everybody will do it at a certain time simultaneously, so that as hashes of the tree are exchanged, they will always refer to the same data set.  An arbitrary number must be chosen that strikes a balance between the resource burden of rebalancing the tree (which favors a rebalance less frequently), versus the burden required of someone to get themselves up to speed (which favors a rebalance more frequently).

And while the meta tree may in fact be large, no node ever needs to accumulate older copies of the meta tree, so it is not as though having a 1GB meta tree is going to mean 1 GB for every 100 blocks.  There is value to having a few old revisions of the meta tree (e.g. so that a node can opt to formulate its view of the network starting a certain number of blocks back and protect itself from orphan blocks) but there is no reason, for example, anyone needs to accumulate this meta data for historical purposes, as it is completely reconstructable from the normal block chain.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
tevirk
Newbie
*
Offline Offline

Activity: 15
Merit: 0



View Profile
June 21, 2012, 06:10:25 AM
 #92

Sorry to be slow, but I don't see the gain here.  If a lightweight client is going to trust that a metablock that's been merged into the chain is truthful (because it's been built into a block), then it can just as reliably trust that a transaction that's in the chain a few blocks back is valid, because it's been built into a block.  There's no need for it to keep anything.  The only real advantage here seems to be that it saves miners from having to have a hard disk, and it seems like a lot of engineering to do that.

Quite possibly I'm missing something, in which case it would probably help for someone to step back and explain the aims and benefits.
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1006


View Profile
June 21, 2012, 08:00:53 AM
 #93

I am the one who originally suggested the 100 block interval... but I don't think I said that updating the meta tree only every 100 blocks is what should be done.

Rather, the meta tree should be rebalanced every 100 blocks, and between then, nodes should be added and deleted using methodology that avoids (procrastinates) having to recalculate the hashes for most or all the nodes in the tree any time there was a change.  Otherwise, every incoming transaction will have a huge CPU time burden that's not sustainable.  Rebalancing the tree is much like rebuilding a database index.
I'm not sure I follow. Updating any tree (balanced or not) is a constant-time operation. Updating any Merkle-tree (balanced or not) is log(N), although you can save a little effort by marking updated nodes as dirty and only updating Merkle hashes at the end. Rebuilding a database is N*log(N), a different beast. Anyway that's beside the point. Updating a balanced Merkle tree shouldn't be any more complex, algorithmically, than leaving it unbalanced. Unless I'm missing something; am I?

Sorry to be slow, but I don't see the gain here.  If a lightweight client is going to trust that a metablock that's been merged into the chain is truthful (because it's been built into a block), then it can just as reliably trust that a transaction that's in the chain a few blocks back is valid, because it's been built into a block.  There's no need for it to keep anything.  The only real advantage here seems to be that it saves miners from having to have a hard disk, and it seems like a lot of engineering to do that.

Quite possibly I'm missing something, in which case it would probably help for someone to step back and explain the aims and benefits.
Wrong problem. It saves the lightweight client from having to download, verify, and keep track of any of the block chain at all, except for those parts the user cares about (their own unspent outputs, for example).

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1073


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
June 21, 2012, 04:01:12 PM
 #94

Sorry to be slow, but I don't see the gain here.  If a lightweight client is going to trust that a metablock that's been merged into the chain is truthful (because it's been built into a block), then it can just as reliably trust that a transaction that's in the chain a few blocks back is valid, because it's been built into a block.  There's no need for it to keep anything.  The only real advantage here seems to be that it saves miners from having to have a hard disk, and it seems like a lot of engineering to do that.

Quite possibly I'm missing something, in which case it would probably help for someone to step back and explain the aims and benefits.

That works if, as a lightweight node, you plan only on receiving funds that have a very small number of confirmations, which eliminates your view of the majority of bitcoins that exist.  In USD terms, this would be like limiting yourself to only being able to accept crisp dollar bills that have never been handled more than once or twice.  More likely than not, you're going to need to be able to receive funds from anybody, which will have been confirmed anywhere on the block chain between the genesis block and now.  You either need the whole block chain to know whether a given incoming transaction is valid, or at least the digested tree of all unspent txouts for the entire block chain.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1073


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
June 21, 2012, 04:07:48 PM
 #95

I am the one who originally suggested the 100 block interval... but I don't think I said that updating the meta tree only every 100 blocks is what should be done.

Rather, the meta tree should be rebalanced every 100 blocks, and between then, nodes should be added and deleted using methodology that avoids (procrastinates) having to recalculate the hashes for most or all the nodes in the tree any time there was a change.  Otherwise, every incoming transaction will have a huge CPU time burden that's not sustainable.  Rebalancing the tree is much like rebuilding a database index.
I'm not sure I follow. Updating any tree (balanced or not) is a constant-time operation. Updating any Merkle-tree (balanced or not) is log(N), although you can save a little effort by marking updated nodes as dirty and only updating Merkle hashes at the end. Rebuilding a database is N*log(N), a different beast. Anyway that's beside the point. Updating a balanced Merkle tree shouldn't be any more complex, algorithmically, than leaving it unbalanced. Unless I'm missing something; am I?

My understanding is that removing leaf nodes from a sorted Merkle tree while maintaining the constraint that the tree remain sorted and balanced has the potential to cause the hash of every node to be recalculated.  Imagine going from a tree that has 513 nodes to one that has 512 nodes.  The tree will lose a whole rank, and 100% of its hashes will change.  That's an extreme case, but not far from the typical case: if you remove a leaf out of the middle and don't replace it with a placeholder, all the leaf nodes will shift left by one position to maintain the sort and balance constraints, and every parent of any node that has shifted will be recalculated.

The closer the removal is to the left side of the tree, the greater proportion of the tree must be recalc'd.  A recalc of a tree in the hundreds of MB or in the GB's for every incoming Bitcoin transaction would be unsustainably and unscalably expensive.  All transactions result in the spending, and therefore, a deletion of at least one leaf node, so this kind of update would be CPU-intensive for every user upon every roll of Satoshi's dice.  So my idea - to keep the tree view consistent and keep the updating to log(N) would be to only balance the tree on a predetermined interval, and at any point between, use placeholders and allow leaf nodes to become branches (both of which - especially the latter - would make the tree no longer a proper Merkle tree) to conserve CPU resources during updates.


Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
jojkaart
Member
**
Offline Offline

Activity: 97
Merit: 10


View Profile
June 21, 2012, 04:13:07 PM
 #96

My understanding is that removing leaf nodes from a sorted Merkle tree while maintaining the constraint that the tree remain sorted and balanced has the potential to cause the hash of every node to be recalculated.  Imagine going from a tree that has 513 nodes to one that has 512 nodes.  The tree will lose a whole rank.  That's an extreme case, but not far from the typical case: if you remove a leaf out of the middle and don't replace it with a placeholder, all the leaf nodes will shift left by one position to maintain the sort and balance constraints, and every parent of any node that has shifted will be recalculated.  The closer the removal is to the left side of the tree, the greater proportion of the tree must be recalc'd.  A recalc of a tree in the hundreds of MB or in the GB's for every incoming Bitcoin transaction would be overbearingly expensive.  All transactions result in the spending, and therefore, a deletion of at least one leaf node, so this kind of update would be CPU-intensive for every roll of Satoshi's dice.  So my idea - to keep the tree view consistent and keep the updating to log(N) would be to only balance the tree on a predetermined interval, and at any point between, use placeholders and allow leaf nodes to become branches to conserve updating resources.

Not quite true. For example, the red-black tree algorithm guarantees worst case operation of O(log N). Most branches, although they will move, will keep their hashes as they were, no need to recalculate.

- Joel
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1073


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
June 21, 2012, 04:16:19 PM
 #97


Not quite true. For example, the red-black tree algorithm guarantees worst case operation of O(log N). Most branches, although they will move, will keep their hashes as they were, no need to recalculate.

- Joel

The red-black tree is a concept I don't yet understand.  But if choosing this type of structure brings the benefit of O(log N) updates without introducing any negatives, then I'm all for it, and of course the periodic rebalance would become unnecessary.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
EnergyVampire
Full Member
***
Offline Offline

Activity: 210
Merit: 100



View Profile
June 21, 2012, 04:56:21 PM
Last edit: June 26, 2012, 11:30:55 PM by EnergyVampire
 #98

Subscribing

Added to Watchlist: https://bitcointalk.org/index.php?topic=90136.0

etotheipi
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1048


Core Armory Developer


View Profile WWW
June 21, 2012, 05:18:11 PM
 #99


Not quite true. For example, the red-black tree algorithm guarantees worst case operation of O(log N). Most branches, although they will move, will keep their hashes as they were, no need to recalculate.

- Joel

The red-black tree is a concept I don't yet understand.  But if choosing this type of structure brings the benefit of O(log N) updates without introducing any negatives, then I'm all for it, and of course the periodic rebalance would become unnecessary.

This is a topic I've been debating with folks on IRC the past couple days.  It's clear that most tree data strcutures have most of the properties we want.  Red-Black trees work great, but I don't like that the specific underlying structure (and thus root hash) depends on the specific order/history of insertions and deletions.  And assumes that every red-black implementation uses the rebalancing algorithm.  I have been given compelling reasons why this shouldn't be a problem, but I am personally not convinced yet.  Though I agree that it is probably an acceptable solution.

I'm voting for level-compressed trie structures (so, a variant of a patricia trees) which have no balance issues at all, are insert-order-independent, and O(1) query/insert/delete.  The problem is they can have a lot of storage overhead per tree element.  I haven't done the calculation to know for sure just how bad it is. 

Once I get out my next version of Armory, I will be diving into this a bit more, hopefully creating a proposal with much more specifics about tree structure, and the CONOPs (concept of operations) of the meta-chain.


Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1073


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
June 21, 2012, 05:30:18 PM
 #100

This is a topic I've been debating with folks on IRC the past couple days.  It's clear that most tree data strcutures have most of the properties we want.  Red-Black trees work great, but I don't like that the specific underlying structure (and thus root hash) depends on the specific order/history of insertions and deletions.  And assumes that every red-black implementation uses the rebalancing algorithm.  I have been given compelling reasons why this shouldn't be a problem, but I am personally not convinced yet.  Though I agree that it is probably an acceptable solution.

Regardless of the tree structure chosen, why not rebuild it every 100 blocks, just for the sole purpose of having a periodic way of deterministically regenerating the tree, and to avoid mandating a continuous dependency on all prior versions of the meta tree in order to be certain that one has the "correct" permutation of the meta tree?

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1006


View Profile
June 21, 2012, 05:44:05 PM
 #101

@etothepi, what about non-standard transactions? Including IP, P2SH and future contract formats. Not all outputs can be reduced to an address. We've been speaking loosely about a tree of “addresses” but it would really have to be a tree of output scripts, so it's not going to be possible to limit search-string length for the prefix trie.

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1073


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
June 21, 2012, 05:53:47 PM
 #102

@etothepi, what about non-standard transactions? Including IP, P2SH and future contract formats. Not all outputs can be reduced to an address. We've been speaking loosely about a tree of “addresses” but it would really have to be a tree of output scripts, so it's not going to be possible to limit search-string length for the prefix trie.

I would think that all that matters is there be a deterministic index that can be used to look it up.

P2SH has a hash.  IP, to the best of my knowledge, isn't a kind of transaction, but is just a way to produce a pubkey-based transaction (from which an address/hash can be derived).  Transaction formats yet to be invented could easily stipulate some way of being found, if a simple default of "first hash in the script, or hash of [first constant | all concatenated constants] in the script bigger than X bits, whichever comes first" didn't solve most or all cases with a single broad stroke.  (For example, if such a default didn't make sense for a future transaction type, that future transaction type could contain a field that says "My Search Key is X".)

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
vuce
Sr. Member
****
Offline Offline

Activity: 476
Merit: 250


View Profile
June 21, 2012, 05:56:35 PM
Last edit: June 21, 2012, 06:12:32 PM by vuce
 #103

I'm voting for level-compressed trie structures (so, a variant of a patricia trees) which have no balance issues at all, are insert-order-independent, and O(1) query/insert/delete.
Quote
To insert a string, we search the trie until we can make no further progress. At this point we either add a new outgoing edge labeled with all remaining characters in the input string, or if there is already an outgoing edge sharing a prefix with the remaining input string, we split it into two edges (the first labeled with the common prefix) and proceed.
So new insertions go to the leaves of the tree, I think this would make it insert dependent - just like any other tree. I'd suggest avl instead of r-b, since it has lower worst height.
CoinLab
Sr. Member
****
Offline Offline

Activity: 270
Merit: 250


1CoinLabF5Avpp5kor41ngn7prTFMMHFVc


View Profile WWW
June 21, 2012, 08:34:05 PM
 #104

This is a very interesting idea.  Excited to see how it develops.
etotheipi
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1048


Core Armory Developer


View Profile WWW
June 21, 2012, 08:42:30 PM
 #105

@etothepi, what about non-standard transactions? Including IP, P2SH and future contract formats. Not all outputs can be reduced to an address. We've been speaking loosely about a tree of “addresses” but it would really have to be a tree of output scripts, so it's not going to be possible to limit search-string length for the prefix trie.

I was expecting that the hash of the TxOut script would be used so that all nodes are exactly 32-bytes.  You could argue that's exactly what most TxOut scripts already are:  hashes of longer data fields (such as using hash160s in place of public keys), but you have to make sure the search key is strictly bounded in size if you're using a trie of some sort.

@vuce
The structure of a trie has no dependence on insert order.  Given a set of data, there is only one trie that can hold it.  The same goes for Patricia tries (which are level-compressed tries).  And given that its query, insert and delete times are based strictly on key size (which will be bounded as above), there are no balance issues at all: it always takes exactly 32 "hops" to get from the root to the leaf you want, regardless of whether you are querying, inserting or deleting.  So given fixed key-length, all those operations are actually O(1).  

On the other hand, I was hoping for a structure that wasn't too complicated, and both RB trees and Patricia tries have complicated implementations (even though the concepts behind them are fairly simple).  But if we're going to have to go with something complicated, anyway (to limit worst-case speed and time performance), then I'd have to vote for Patricia trie or variant.  Not only is it O(1)... someone brought up the very good point that updates to the tree can mostly be parallelized.  That sounds like another good property of a tree that's going to have very high update rates...

I just gotta spend some time to figure out the space overhead for storing TxOuts.  If it's going to triple the overall disk space compared to other structures, it might be worth using one of the insert-order dependent trees.

What other data structures am I missing that could be considered?  I know B-trees would be a good choice if we are going with insert-order-dependent structure:  they are easy to keep balanced with fairly simple rules for balancing.

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
vuce
Sr. Member
****
Offline Offline

Activity: 476
Merit: 250


View Profile
June 21, 2012, 08:47:07 PM
Last edit: June 21, 2012, 09:22:18 PM by vuce
 #106

@vuce
The structure of a trie has no dependence on insert order.  Given a set of data, there is only one trie that can hold it.  The same goes for Patricia tries (which are level-compressed tries).  And given that its query, insert and delete times are based strictly on key size (which will be bounded as above), there are no balance issues at all: it always takes exactly 32 "hops" to get from the root to the leaf you want, regardless of whether you are querying, inserting or deleting.  So given fixed key-length, all those operations are actually O(1).  

I was citing the patricia trie wiki, where it's pretty obvious that new inserts are inserted as leaves of the tree, therefore making them insert order dependent. If you would direct me to a better explanation I would appreciate it.

nevermind, I misunderstood how it works  Embarrassed FWIW, I agree, I think this might be the best choice.
Quote
What other data structures am I missing that could be considered?  I know B-trees would be a good choice if we are going with insert-order-dependent structure:  they are easy to keep balanced with fairly simple rules for balancing.

AVL tree is the mother of balanced binary trees. They have the smallest "worst case height", so the fastest query, but a bit slower insert/delete than red-black trees. They are also very easy to implement.

2-3-4 tree might also be worth considering. Don't know if it's insert-order-independent or not but by a quick look it might be. Or maybe plain 2-3 tree, that one has data in leaves only so it looks kind of like a merkle tree, but does have quite a bit of overhead.
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1073


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
June 21, 2012, 08:55:14 PM
 #107

I would consider height to be a worthy thing to control, not so much for query speed, but because the nodes from leaf to ancestor might have to be transmitted across the network in response to a query.

Also as we discuss these tree types, I want to make sure we are not straying far from the definition of a merkle tree, to maintain the desirable property of being able to prove that key x is or is not in the tree by providing all of the hashes necessary to climb to the root. All of these nifty tree types that put data in the branches rather than the leaf nodes likely may not retain that important property.  I read about red black trees on Wikipedia and notice the data does not go in the leaf nodes and cannot clearly see how I could clip part of that tree and hand it to someone and they be able to trust my clipping via a merkle root.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
2112
Legendary
*
Offline Offline

Activity: 2128
Merit: 1042



View Profile
June 21, 2012, 09:28:52 PM
 #108

AVL tree is the mother of balanced binary trees. They have the smallest "worst case height", so the fastest query, but a bit slower insert/delete than red-black trees.
I just wanted to point out that query speed is pretty much immaterial. All it matters is the update complexity.

Integrating over the world population of Bitcoin clients the probability of any particular key being queried is almost 0, but the probability of any particular key being inserted/deleted is almost 1. This is pretty much the exact opposite of the assumptions made in all classic information storage and retrieval texts.

If you come up with a really good storage tree with low overhead for insert/delete but bad query performance you can easily fix it by maintaining a secondary index structure that facilitates fast query for keys that are locally interesting. That secondary structure may be different for each individual client and be dependent on the local querying behavior.

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0
2112
Legendary
*
Offline Offline

Activity: 2128
Merit: 1042



View Profile
June 21, 2012, 09:44:50 PM
 #109

I am the one who originally suggested the 100 block interval... but I don't think I said that updating the meta tree only every 100 blocks is what should be done.
Also I urge to seriously consider batch updating the primary storage structure. And keep the recently-heard-of updates in a separate storage area. This probably should be somehow similar to the generational garbage collection concept.

I would also urge to avoid using 100 but choose a divisor of 2016.

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1073


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
June 21, 2012, 09:47:25 PM
 #110

I am the one who originally suggested the 100 block interval... but I don't think I said that updating the meta tree only every 100 blocks is what should be done.
Also I urge to seriously consider batch updating the primary storage structure. And keep the recently-heard-of updates in a separate storage area. This probably should be somehow similar to the generational garbage collection concept.

I would also urge to avoid using 100 but choose a divisor of 2016.

If you mean a factor of 2016, how about 21. (21x96=2016) That's also a clean round factor of 21 million, as well as 210000 blocks between reward changes.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
2112
Legendary
*
Offline Offline

Activity: 2128
Merit: 1042



View Profile
June 21, 2012, 10:08:06 PM
 #111

If you mean a factor of 2016, how about 21. (21x96=2016) That's also a clean round factor of 21 million, as well as 210000 blocks between reward changes.
I think it is too early to make this decision. I just wanted to stress that the heaviest housekeeping updates should be phase-shifted with respect to the difficulty retarget. In other words the blocks just before and just after the retarget should involve only light housekeeping.

I haven't seen anyone doing any serious game-theoretic analysis of the possible splitting attacks on the global Bitcoin network during the retarget, but I just want to avoid creating additional headaches resulting from batch updates.

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1073


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
June 21, 2012, 10:20:30 PM
 #112

I just wanted to confirm that we understand that when the network "retargets", or the block reward halves, it isn't actually doing anything computationally intensive.  All that happens is that the client expects to see a different number in a field in future blocks as compared to past blocks.

Rebuilding a tree isn't too computationally intensive to have happen once every 3.5 hours (which is what 21 blocks would suggest - and is also a relatively fixed interval).  All that matters is that we know it is too computationally intensive to happen per transaction (which is multiple times per minute and tends to infinity).  I can't imagine picking this number is premature, as whether the magic number is 1, 3, 7, 21, or 420 blocks, all choices accomplish the goal of not pegging users CPU's to 100% chugging through this tree.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1006


View Profile
June 22, 2012, 01:14:11 AM
Merited by ETFbitcoin (1)
 #113

Quote from: etotheipi
On the other hand, I was hoping for a structure that wasn't too complicated, and both RB trees and Patricia tries have complicated implementations (even though the concepts behind them are fairly simple).  But if we're going to have to go with something complicated, anyway (to limit worst-case speed and time performance), then I'd have to vote for Patricia trie or variant.  Not only is it O(1)... someone brought up the very good point that updates to the tree can mostly be parallelized.  That sounds like another good property of a tree that's going to have very high update rates...
The 2-3-4 tree (aka B-tree of order 4) is really the only one I would consider in this circumstance. RB-trees are actually a special representation of 2-3-4 trees, but the implementation choices in balancing an RB-tree don't exist for 2-3-4 trees (for a given 2-3-4 tree there can exist more than one RB-trees that “encodes” that structure, but not vice versa). A higher order B-tree would also work, but then you would be trading increase CPU time for decreased I/O, which doesn't fit this application.

That said, the ability to parallelize prefix/radix tries is a very good point. You might win me over to that side yet... but if self-balancing trees are to be used, the 2-3-4 tree has some clear benefits over others (KVL, RB, etc.).


To the other posts... why would you ever need to rebuild the tree? I don't understand the purpose. If you are using a self-balancing structure then it stays balanced “for free”. And under what circumstance would you have all the transaction data, but not an existing tree structure or the block chain from which you can order the updates?

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1073


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
June 22, 2012, 02:22:26 AM
 #114

If you are using a self-balancing structure then it stays balanced “for free”. And under what circumstance would you have all the transaction data, but not an existing tree structure or the block chain from which you can order the updates?

If you have a self-balancing structure, you may not have a merkle tree, which is what you'd want as a node so you can serve lookups to lite clients with no block chain that they can determine with 100% certainty whether you're telling the truth given just the merkle root.  What you have instead with all these neat tree ideas is an index to speed up database ops - which would be great if we're building a database engine - but the title of the OP is "trust-free lite nodes".  I am not sure that by enumerating all of these other wonderful tree structures that we are remembering we need the cryptographic properties that a merkle tree offers to accomplish the stated goal.

Assuming you had a copy of such a tree... the circumstance one would have possession of it, is one as a node would have acquired it from a peer as a complete and total substitute for the block chain (other than the block headers and perhaps a few days of recent blocks).  The whole point of this exercise is to ditch the storage requirement of spent transactions from the block chain for the purpose of scaling Bitcoin and dealing with the fact that the block chain will soon be too heavy to lift - not so much to shave milliseconds off index lookups.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
etotheipi
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1048


Core Armory Developer


View Profile WWW
June 22, 2012, 02:46:27 AM
 #115

If you are using a self-balancing structure then it stays balanced “for free”. And under what circumstance would you have all the transaction data, but not an existing tree structure or the block chain from which you can order the updates?

If you have a self-balancing structure, you may not have a merkle tree, ...

Any of these can be made into an authentication data structure.  Each node, including non-leaf nodes may represent data, so you just append this node's data, to the hashes of the non-null children and hash that to get the current nodes value.  Then it's parent nodes do the same to agregate their children.

I haven't defined it rigorously, but it can be done.  One issue with a 256-way Patricia/radix tree is that each level will need the values of the other 255 children in order to verify each level.  Granted, it only matters at the top levels where there's a lot of branching, beyond levels 4 and 5 basically all other children pointers will be null.  But it goes along with why I'm hesitant to endorse a patricia tree: there might be a lot of data to transfer to very a node.



Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
2112
Legendary
*
Offline Offline

Activity: 2128
Merit: 1042



View Profile
June 22, 2012, 03:00:20 AM
 #116

why would you ever need to rebuild the tree?
Probably instead of "rebuilding the tree" the better phrase whould be "rehashing the nodes on the update path in the tree". The rehashing of many short strings would completely dominate the cost of the update (insertion/deletion). This is one of those things that the O() notation doesn't convey.

The benefits of a short-and-fat data structure over a tall-and-lean are four-fold:

1) less rehash overhead after update
2) less latency sensitivity when queried level-wise over the p2p network (at the expense of wasted bandwidth)
3) lower write amplification if this structure is stored locally in the block-storage device
4) ability to batch-update the nodes incurring single rehash overhead for multiple inserts/deletions.

Ultmately from the point of long-term code stability it would be better to choose a more generic structure instead of 2-3-4-tree. If we choose N-to-M-tree we could set the N and M values based on the experimentation now and possibly easily change them in the future if experience shows that our initial guess was not so good.

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0
tevirk
Newbie
*
Offline Offline

Activity: 15
Merit: 0



View Profile
June 22, 2012, 05:05:48 AM
 #117

Sorry to be slow, but I don't see the gain here.  If a lightweight client is going to trust that a metablock that's been merged into the chain is truthful (because it's been built into a block), then it can just as reliably trust that a transaction that's in the chain a few blocks back is valid, because it's been built into a block.


That works if, as a lightweight node, you plan only on receiving funds that have a very small number of confirmations, which eliminates your view of the majority of bitcoins that exist.  In USD terms, this would be like limiting yourself to only being able to accept crisp dollar bills that have never been handled more than once or twice.  More likely than not, you're going to need to be able to receive funds from anybody, which will have been confirmed anywhere on the block chain between the genesis block and now.  You either need the whole block chain to know whether a given incoming transaction is valid, or at least the digested tree of all unspent txouts for the entire block chain.

I'm not talking about the inputs to the transaction that pays me, I'm talking about the transaction that pays me itself. Fred posts a transaction  which he says pays me 100 bitcoins. If I have the digested tree, or the whole block chain, I can check the transaction is valid. If I don't, I can't. But either way, it's still unverified. I'm going to wait for confirmations. Once I have confirmations, then I know it's valid too, because if I'm trusting the miners not to include fictional unspent txout digests I might just as well trust them not to include invalid transactions.

The problem this mechanism solves - validating an unverified transaction in a lightweight node - doesn't look to me like a very important one.
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1073


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
June 22, 2012, 05:30:01 AM
 #118

I'm not talking about the inputs to the transaction that pays me, I'm talking about the transaction that pays me itself. Fred posts a transaction  which he says pays me 100 bitcoins. If I have the digested tree, or the whole block chain, I can check the transaction is valid. If I don't, I can't. But either way, it's still unverified. I'm going to wait for confirmations. Once I have confirmations, then I know it's valid too, because if I'm trusting the miners not to include fictional unspent txout digests I might just as well trust them not to include invalid transactions.

The problem this mechanism solves - validating an unverified transaction in a lightweight node - doesn't look to me like a very important one.


Sure, if you aren't mining, like spam, and don't see any value in reliably knowing you have received funds from Fred in under an hour, since you can't tell the difference between a bogus transaction from Fred with fake inputs and a real one.  You might be willing and able to wait 6 confirmations before deciding others have paid you, but others won't.

Under this proposal, miners can reliably mine using these trees and not the block chain.  If you think all miners will want to lug around a block chain whose size tends closer to infinity with each person who starts running a gambling bot, then sure, this isn't important.

Being able to validate a transaction instantly is important for spam prevention.  Nodes only relay valid transactions.  If you can't validate transactions, you have no choice but to blindly spew to your peers anything any of them sends.  You'll be a sitting duck for DoS attacks (since for every 1 message coming in you'll nominally send 7 out), and a whole network made of nodes like this would be easy to spam into oblivion.

Finally, this tree proposal isn't meant to RUN on a lightweight node.  It is meant to make a normal node be able to SERVE another lightweight node, at the same time not having to have the full unabridged block chain.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
tevirk
Newbie
*
Offline Offline

Activity: 15
Merit: 0



View Profile
June 22, 2012, 06:08:09 AM
 #119

I'm not talking about the inputs to the transaction that pays me, I'm talking about the transaction that pays me itself. Fred posts a transaction  which he says pays me 100 bitcoins. If I have the digested tree, or the whole block chain, I can check the transaction is valid. If I don't, I can't. But either way, it's still unverified. I'm going to wait for confirmations. Once I have confirmations, then I know it's valid too, because if I'm trusting the miners not to include fictional unspent txout digests I might just as well trust them not to include invalid transactions.

The problem this mechanism solves - validating an unverified transaction in a lightweight node - doesn't look to me like a very important one.


Sure, if you aren't mining, like spam, and don't see any value in reliably knowing you have received funds from Fred in under an hour, since you can't tell the difference between a bogus transaction from Fred with fake inputs and a real one.  You might be willing and able to wait 6 confirmations before deciding others have paid you, but others won't.

Under this proposal, miners can reliably mine using these trees and not the block chain.  If you think all miners will want to lug around a block chain whose size tends closer to infinity with each person who starts running a gambling bot, then sure, this isn't important.

Being able to validate a transaction instantly is important for spam prevention.  Nodes only relay valid transactions.  If you can't validate transactions, you have no choice but to blindly spew to your peers anything any of them sends.  You'll be a sitting duck for DoS attacks (since for every 1 message coming in you'll nominally send 7 out), and a whole network made of nodes like this would be easy to spam into oblivion.

Finally, this tree proposal isn't meant to RUN on a lightweight node.  It is meant to make a normal node be able to SERVE another lightweight node, at the same time not having to have the full unabridged block chain.


So the scenario in which this helps is where

(a) transaction volume is so high that even miners running fancy purpose-built mining rigs can't store the transaction history on a standard-issue 1TB hard drive
(b) every Tom, Dick and Harry runs a lightweight node which relays every single transaction on a P2P network.

Those two conditions contradict each other.  If transaction rate goes up that high (and I think it shouldn't, but that's an entirely different discussion), bandwidth becomes the limiting factor before storage space does. At that transaction rate, inevitably the bitcoin network evolves to a backbone of heavy nodes exchanging everything and lightweight clients which consume only data of interest to them. That's quite independent of how the history is handled.

As to unconfirmed transactions, are there really going to be that many people who will accept an unconfirmed transaction, but not be willing to trust anyone to validate it for them?
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1006


View Profile
June 22, 2012, 07:21:48 AM
 #120

Maybe this will help: the trouble is when Satoshi released Bitcoin 0.1 and the whitepaper, he came up with this idea for a version of the client that only kept block headers and skimmed for transactions it cares about. It's become known as SPV--Simple Payment Verification--or a “lightweight client” and has a weaker trust model than a full verifying client.

We are not discussing that in this thread. What's going on here is the creation of a full-featured “thick client” which doesn't require the entire block chain history and could conceivably be run even on low-memory and embedded devices as bitcoin scales up. You can have your cake and eat it too.

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
etotheipi
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1048


Core Armory Developer


View Profile WWW
June 22, 2012, 12:08:58 PM
 #121

The number one problem that this solves (besides blockchain pruning) is one of trust.  When a new node gets on the network right now, there are two options:

(1) Run a full node, with full blockchain, ultimate security -- verify everything in the blockchain yourself
(2) Run a lightweight node, with reduced level of security -- trust that someone else gave you your own correct balance, and have no way to check whether transactions are valid, especially zero-conf tx

With this meta-chain in place, you can run (1) for a lot less disk space, and (2; lightweight nodes) can achieve the ultimate security model without needing to hold the full chain.  I can verifiably import my own wallet knowing nothing but the headers and verify it directly against the blockchain.  If someone gives me a zero-conf tx, I can check not only that its inputs exist, but that the inputs haven't been spent yet.  Zero-conf tx should not really be trusted, anyway... but at least you can verify whether it's even possible for the network to accept it, which what a full node does.


Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
Serith
Sr. Member
****
Offline Offline

Activity: 269
Merit: 250


View Profile
June 22, 2012, 04:46:33 PM
 #122

If someone gives me a zero-conf tx, I can check not only that its inputs exist, but that the inputs haven't been spent yet.  Zero-conf tx should not really be trusted, anyway... but at least you can verify whether it's even possible for the network to accept it, which what a full node does.
I posted an idea about how to make 0-conf tx safe. There was some discussion, and the idea wasn't killed so far, but there wasn't much of an interest either, so I am starting to believe that there is no need in 0-conf tx.
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1073


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
June 22, 2012, 04:49:16 PM
 #123

I posted an idea about how to make 0-conf tx safe. There was some discussion, and it wasn't killed so far, but there wasn't much of an interest either, so I am starting to believe that it's not something demanded.

I think when one of the core developers immediately points out how it can be used to reliably defraud somebody, that makes it pretty much DOA.  But I would propose that there is indeed demand for a reliable zero-confirmation transaction.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
etotheipi
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1048


Core Armory Developer


View Profile WWW
June 22, 2012, 07:38:33 PM
 #124

I posted an idea about how to make 0-conf tx safe. There was some discussion, and it wasn't killed so far, but there wasn't much of an interest either, so I am starting to believe that it's not something demanded.

I think when one of the core developers immediately points out how it can be used to reliably defraud somebody, that makes it pretty much DOA.  But I would propose that there is indeed demand for a reliable zero-confirmation transaction.

I hate bringing up zero-conf tx in general, because there are so many issues with them that they are useful only in isolated cases, usually partial-trust situations.  I guess, this shouldn't be seen as a driving force for implementing this proposal, but more seen as an example of how much verifiable information can be obtained from the network with minimal download.  It's fast enough that a light node could obtain as much information about a zero-conf tx as a full-node, with just a few kB downloaded.  Regardless of how that information is used, it's a huge functional advantage from where we are currently.

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
Serith
Sr. Member
****
Offline Offline

Activity: 269
Merit: 250


View Profile
June 22, 2012, 11:43:28 PM
Last edit: June 22, 2012, 11:55:01 PM by Serith
 #125

I posted an idea about how to make 0-conf tx safe. There was some discussion, and it wasn't killed so far, but there wasn't much of an interest either, so I am starting to believe that it's not something demanded.

I think when one of the core developers immediately points out how it can be used to reliably defraud somebody, that makes it pretty much DOA.  But I would propose that there is indeed demand for a reliable zero-confirmation transaction.

I believe he was wrong, at least no one objected when I explained why it is not a problem, a basic double spend check from a pool completely solves it. And if you are agree with him could you elaborate your opinion here or in the thread? The good thing about the idea is that it has the same property as etotheipi's proposal of how non-disruptive it is. It doesn't require any changes in Bitcoin, just collaboration between 2 or 3 major pools.

I pay you normally without using this system, then use the pool-signed txn to pay myself in a doublespend.  It would make attacks very reliable.
A pool must check if there is any conflicting transaction already present and if true then refuse to sign the multi-signature transaction.
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1073


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
June 23, 2012, 12:33:42 AM
Last edit: June 23, 2012, 12:50:35 AM by casascius
 #126

Instead of saying your idea is wrong, maybe I can contribute to it.

One of the things gmaxwell pointed out was that mining pools may be around now but there is no guarantee they will be around later, in fact he thinks they probably won't. So depending on them as fundamental architecture is probably a bad idea all around.

But imagine miners (both solo and pools) included a IP:Port calling card in the coinbase of their block. The calling card would convey the message: I am a miner, you can contact me via udp directly at (ip:port), send me your transaction, and if it looks valid, I will give you a signed promise (signed by coinbase key) that I accept and plan to confirm this transaction.

One would know what percentage of the mining pool any given calling card represents just by the number of recent blocks containing it.

Someone wanting a miner commitment on a transaction would blast out that transaction via UDP to all of the miners whose calling cards appear in the last 1000 blocks.  That sounds extreme, but we're only talking a few hundred kilobytes total, with the total going to each miner being under 1-2KB.

By using UDP instead of TCP, one could blindly blast out a bunch of simultaneous requests into the internet on a "best effort" basis with low overhead, knowing most of them will arrive and many won't and that that's OK.  The responses would arrive the same way.

Either the sender or the recipient of a transaction could immediately contact thousands of miners with a blast of udp packets and rapidly get an accurate feel for how much mining power supporting the transaction has just by gauging the udp responses that come back within the following 10 seconds.

If it is a supermajority you have success.

Such could be the standard practice for accepting zero conf transactions.  It could be an excellent revenue generator for the mining community as a whole in the form of a for-pay service (example, all miners could stipulate that this UDP confirmation service is only available if transaction fee [in the transaction being zero-confirmed] is meets a far more generous criterion than the satoshi client minimum)

To address gmaxwell's rightfully placed fear that I could pay "you" and then use the premium service to pay the doublespend to myself... if getting zero-conf is a fee-based service paid by the payer, then "you" could demand, as a condition of giving me goods with zero conf, that I include a fee big enough to ensure that you can click a button in your client and enjoy the confirmation service yourself, prepaid.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
Serith
Sr. Member
****
Offline Offline

Activity: 269
Merit: 250


View Profile
June 23, 2012, 09:37:09 PM
 #127

Instead of saying your idea is wrong, maybe I can contribute to it.

One of the things gmaxwell pointed out was that mining pools may be around now but there is no guarantee they will be around later, in fact he thinks they probably won't. So depending on them as fundamental architecture is probably a bad idea all around.

But imagine miners (both solo and pools) included a IP:Port calling card in the coinbase of their block. The calling card would convey the message: I am a miner, you can contact me via udp directly at (ip:port), send me your transaction, and if it looks valid, I will give you a signed promise (signed by coinbase key) that I accept and plan to confirm this transaction.

One would know what percentage of the mining pool any given calling card represents just by the number of recent blocks containing it.

Someone wanting a miner commitment on a transaction would blast out that transaction via UDP to all of the miners whose calling cards appear in the last 1000 blocks.  That sounds extreme, but we're only talking a few hundred kilobytes total, with the total going to each miner being under 1-2KB.

By using UDP instead of TCP, one could blindly blast out a bunch of simultaneous requests into the internet on a "best effort" basis with low overhead, knowing most of them will arrive and many won't and that that's OK.  The responses would arrive the same way.

Either the sender or the recipient of a transaction could immediately contact thousands of miners with a blast of udp packets and rapidly get an accurate feel for how much mining power supporting the transaction has just by gauging the udp responses that come back within the following 10 seconds.

If it is a supermajority you have success.

Such could be the standard practice for accepting zero conf transactions.  It could be an excellent revenue generator for the mining community as a whole in the form of a for-pay service (example, all miners could stipulate that this UDP confirmation service is only available if transaction fee [in the transaction being zero-confirmed] is meets a far more generous criterion than the satoshi client minimum)

To address gmaxwell's rightfully placed fear that I could pay "you" and then use the premium service to pay the doublespend to myself... if getting zero-conf is a fee-based service paid by the payer, then "you" could demand, as a condition of giving me goods with zero conf, that I include a fee big enough to ensure that you can click a button in your client and enjoy the confirmation service yourself, prepaid.

I agree with you and the second part of gmaxwell's post, that my proposal brings more centralization, and that's not ideal. You are thinking about making 0-confirmation tnx accepted in decentralize or less centralized way, that would be great if it's possible. I see problems with your proposal, but I am not ready to discuss it yet.

I think we are derailing etotheipi's thread so I made a copy of the post in my thread.
eldentyrell
Donator
Legendary
*
Offline Offline

Activity: 980
Merit: 1004


felonious vagrancy, personified


View Profile WWW
June 24, 2012, 09:45:09 PM
Last edit: June 24, 2012, 10:43:45 PM by eldentyrell
 #128

Trustless lightweight-node support

It doesn't seem trustless to me. Lightweight nodes (not storing all unspent outputs) can't know whether a block is valid, so they need to trust the majority of the network's mining power. This is no more secure than SPV, though possibly a little easier for lightweight nodes.

This is a subtle but important point.  A while back I wrote a section about it on the bitcoin wiki.  The Satoshi client never uses "number of blocks deep" as a measure of confidence that a transaction is valid.  Depth is used only as a measure of the likelihood of another, longer chain branch emerging that omits the transaction.

A truly trustless thin client needs to be able to be able to verify a recent block's height (that there really are 180,000 blocks before this one and they obey the max-4x-difficulty-adjustment rule) rather than its depth (that there really were 6 blocks built on top of it -- also known as confirmations).

It's possible to do height verification probabilistically without 2GB of disk+download, but you need more than a tree-of-unspent-transactions to do it -- the block chain has to look more like an upside-down tree or DAG (most recent block is the root) giving you multiple hash-secured O(log n)-long paths from any block to the genesis block.  These "shortcut ancestor links" can be added without 51% agreement.  If each challenge consists of the thin client picking the log(n)-long path and the server replying with the block headers along that path, it doesn't take many challenges or much time/bandwidth/space to drive the probability-of-being-snookered down to something well below the probability of your hardware randomly failing.  Once you have block height verification you can be truly trustless -- or, at least as trustless as the Satoshi client.

The printing press heralded the end of the Dark Ages and made the Enlightenment possible, but it took another three centuries before any country managed to put freedom of the press beyond the reach of legislators.  So it may take a while before cryptocurrencies are free of the AML-NSA-KYC surveillance plague.
galambo
Sr. Member
****
Offline Offline

Activity: 672
Merit: 252


View Profile
June 24, 2012, 10:46:45 PM
 #129


This is a subtle but important point.  A while back I wrote a section about it on the bitcoin wiki.  The Satoshi client never uses "number of blocks deep" as a measure of confidence that a transaction is valid.

A truly trustless thin client needs to be able to be able to verify a recent block's height (that there really are 180,000 blocks before this one and they obey the max-difficulty adjustment rules) rather than its depth (that there really were 6 blocks built on top of it -- also known as confirmations).

It's possible to do height verification probabilistically without 2GB of disk+download, but you need more than a tree-of-unspent-transactions to do it -- the block chain has to look more like a tree giving you multiple hash-secured O(log n)-long paths from any block to the genesis block).  These "shortcut ancestor links" can be added without 51% agreement.

How will the lightweight node know the path hashes are incorrect and to reject that part of a block?

I can see how the thick clients can verify this, but an incorrect path would require all of the thick clients to reject the block if it contained a false path. I think this means another protocol decision point like P2SH.  Otherwise, we'd have ignorant thick clients forwarding malicously crafted "tree transactions" to the light clients.

This isn't meant to be a gotcha. I like your idea (a lot), but I'd like some clarification. I've only started looking into this part of Bitcoin and I'm not a very big data structures guy.

I also don't understand why you think these could be added without the "51%."
eldentyrell
Donator
Legendary
*
Offline Offline

Activity: 980
Merit: 1004


felonious vagrancy, personified


View Profile WWW
June 24, 2012, 10:55:29 PM
Last edit: June 24, 2012, 11:20:01 PM by eldentyrell
 #130

How will the lightweight node know the path hashes are incorrect and to reject that part of a block?

All of these "links" are hashes -- they're self-validating just like the previous-block (prevblock) hash that Bitcoin currently uses.  These are just "extra prevblock pointers", but they're in the coinbase instead of in the headers and they point to an arbitrary ancestor instead of the immediate predecessor.  I'll call these new pointers non-prevblock pointers to distinguish them from the pointers we already have.


I can see how the thick clients can verify this, but an incorrect path would require all of the thick clients to reject the block if it contained a false tree.

I definitely do not propose adding any new block validity criteria.  You're right: hostile miners can add blocks with broken non-prevblock pointers.


Otherwise, we'd have ignorant thick clients forwarding malicously crafted "tree transactions" to the light clients.

If a hostile miner gets a block with broken non-prevblock pointers into the chain, it will amount to -- at worst -- a DOS attack on thin clients until the next block is found by friendly miners.  As long as the hostile block is at the top of the chain, all thin clients will believe that all servers are lying to them.  But they won't be compromised -- they'll simply twiddle their thumbs saying "Nobody has been able to convince me of what reality ought to look like".

As soon as the next {non-prevblock-pointer}-aware miner finds a block, they will add a block which refrains from creating any new paths through the malicious block.  Thin clients resume operation as normal.  In effect, blocks with broken non-prevblock pointers get excluded from the "DAG-within-the-blockchain".

Thin clients which were connected before the malicious block arrived won't be affected.

There are ways to make the cost of the DOS attack above very large (like >99% hashpower) but they add complexity.  


I think this means another protocol decision point like P2SH.

Definitely not!  I believe the trustless thin client problem can be solved without another hardfork (or even miners-only fork).


I also don't understand why you think these could be added without the "51%."

Right, so, suppose I control 1% of the network hashpower.  That means I create 1% of the blocks.  If the radix of the non-prevblock-pointer tree is, say, 200 (i.e. each block can have 200 non-prevblock pointers) that means that my block will be reducing the average path length in the tree more rapidly than the other 99% of the network is adding new blocks.  So the average path length will gradually converge to log(height), even if only 1% of the miners are adding the new pointers -- it's just that each block they add has to include more than 100 new pointers.  Of course, the more miners you have the more rapidly we get to the ideal log(height) situation…

The printing press heralded the end of the Dark Ages and made the Enlightenment possible, but it took another three centuries before any country managed to put freedom of the press beyond the reach of legislators.  So it may take a while before cryptocurrencies are free of the AML-NSA-KYC surveillance plague.
eldentyrell
Donator
Legendary
*
Offline Offline

Activity: 980
Merit: 1004


felonious vagrancy, personified


View Profile WWW
June 24, 2012, 11:35:40 PM
 #131

If each challenge consists of the thin client picking the log(n)-long path and the server replying with the block headers along that path, it doesn't take many challenges or much time/bandwidth/space to drive the probability-of-being-snookered down to something well below the probability of your hardware randomly failing.

By the way, since these servers are full clients they could charge thin clients some tiny amount like 0.00000001 BTC per challenge, giving people an incentive to run full-chain clients on machines with big disks.

Thin clients should use whichever server they are able to contact that has failed the fewest challenges (an honest server will only fail challenges if there is a hostile block at the top of the chain).  Bad guys could run a server without a blockchain that just spews back junk answers, but they'd only get one challenge per thin client and then be promptly ignored -- not enough money to be worth the trouble.

The printing press heralded the end of the Dark Ages and made the Enlightenment possible, but it took another three centuries before any country managed to put freedom of the press beyond the reach of legislators.  So it may take a while before cryptocurrencies are free of the AML-NSA-KYC surveillance plague.
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1006


View Profile
June 25, 2012, 02:41:59 PM
 #132

Over in the altchain section I announced a crowd funding campaign for a demurrage currency. If we reach our goal, one of the things we will do is fully implement etothepi's proposal, either in Armory or the official client, and back-port the changes to Bitcoin.

Although relevant, I don't want to spam the forum, so this will be my only cross-post regarding it.

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
btharper
Sr. Member
****
Offline Offline

Activity: 389
Merit: 250



View Profile
June 26, 2012, 06:49:13 AM
 #133

Sub, and I think I'll have to reread everything already, I'm not sure there's one post with a full description anymore, just a lot of ideas that have been getting pieced together. Not to say that's bad, I just need to make sure I can parse out the current "best" proposal.
mp420
Hero Member
*****
Offline Offline

Activity: 501
Merit: 500


View Profile
June 27, 2012, 08:48:06 AM
 #134

I think this, and other related proposals, are the only ones around that are really taking scalability seriously. I haven't really gotten my head around the specifics yet, but the thing I really like about this proposal is that it requires no changes in the "Bitcoin Proper" at all.

I really hope this goes forward.
unclescrooge
aka Raphy
Hero Member
*****
Offline Offline

Activity: 868
Merit: 1000


View Profile
July 06, 2012, 12:53:30 PM
 #135

Hello,

Did the developers reach an agreement on how to prune the blockchain?

I didn't see much activity on the mailing list?
apetersson
Hero Member
*****
Offline Offline

Activity: 668
Merit: 501



View Profile
July 06, 2012, 03:01:21 PM
Last edit: July 06, 2012, 08:49:30 PM by apetersson
 #136

i think right now we need an experimental implementation to see how this approach would perform in practice.

IMO, the ideas outlined by etotheipi are the right way to go for a tiered bitcoin network, and for better scalability.
jgarzik
Legendary
*
qt
Offline Offline

Activity: 1596
Merit: 1022


View Profile
July 06, 2012, 03:45:28 PM
 #137

Did the developers reach an agreement on how to prune the blockchain?

sipa has been working on his "ultraprune" branch at github.  It is discussed on IRC.


Jeff Garzik, Bloq CEO, former bitcoin core dev team; opinions are my own.
Visit bloq.com / metronome.io
Donations / tip jar: 1BrufViLKnSWtuWGkryPsKsxonV2NQ7Tcj
apetersson
Hero Member
*****
Offline Offline

Activity: 668
Merit: 501



View Profile
July 06, 2012, 08:49:05 PM
 #138

sipa has been working on his "ultraprune" branch at github.  It is discussed on IRC.

the prunig efforts are nice and will lead to much faster desktop clients and full nodes, but they do not really solve the problem of very light nodes trying to obtain their bitcoin balance and transaction history from the network quickly.
ArticMine
Legendary
*
Offline Offline

Activity: 2282
Merit: 1046


Monero Core Team


View Profile
July 11, 2012, 10:55:24 PM
 #139

Before commenting on this thread I reviewed Satoshi Nakamoto's original paper: Bitcoin: A Peer-to-Peer Electronic Cash System, bitcoin.org/bitcoin.pdf, and I am left with two questions:

1) How is this proposal better or worse than 7. Reclaiming Disk Space in "Bitcoin: A Peer-to-Peer Electronic Cash System" with respect to overall blockchain size management?
2) How is this proposal better or worse than 8. Simplified Payment Verification in "Bitcoin: A Peer-to-Peer Electronic Cash System" with respect to verifying payments?

Concerned that blockchain bloat will lead to centralization? Storing less than 4 GB of data once required the budget of a superpower and a warehouse full of punched cards. https://upload.wikimedia.org/wikipedia/commons/8/87/IBM_card_storage.NARA.jpg https://en.wikipedia.org/wiki/Punched_card
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1073


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
July 11, 2012, 11:07:56 PM
 #140

Before commenting on this thread I reviewed Satoshi Nakamoto's original paper: Bitcoin: A Peer-to-Peer Electronic Cash System, bitcoin.org/bitcoin.pdf, and I am left with two questions:

1) How is this proposal better or worse than 7. Reclaiming Disk Space in "Bitcoin: A Peer-to-Peer Electronic Cash System" with respect to overall blockchain size management?

This works as a way to reclaim disk space provided you are starting with the whole block chain, but as presented, there is no way for one node to convey that stubbed tree to another node along with the assurance that only spent transactions have been removed.  If I run a node that prunes and stubs off a transaction showing I spent some coins, and then send you that pruned block, my spent coins look unspent to you.

Since it's a solution that's only useful to a node with the full block chain, and the real problem we face is more the downloading of the block chain rather than storing it, a solution that requires a full block chain download before anything can be safely pruned doesn't address the problem.

2) How is this proposal better or worse than 8. Simplified Payment Verification in "Bitcoin: A Peer-to-Peer Electronic Cash System" with respect to verifying payments?

That proposal suggests spending the funds and then watching to see if the rest of the network confirms the spend into a block before any useful verification is possible, or freshly receiving the funds while watching new blocks.  The idea discussed in this thread would allow instant verification of the existence of pre-existing funds without having to spend them first or downloading any blocks at all - and is actually not a different proposal, but the same proposal with significant improvements.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
ArticMine
Legendary
*
Offline Offline

Activity: 2282
Merit: 1046


Monero Core Team


View Profile
July 11, 2012, 11:31:18 PM
 #141

Before commenting on this thread I reviewed Satoshi Nakamoto's original paper: Bitcoin: A Peer-to-Peer Electronic Cash System, bitcoin.org/bitcoin.pdf, and I am left with two questions:

1) How is this proposal better or worse than 7. Reclaiming Disk Space in "Bitcoin: A Peer-to-Peer Electronic Cash System" with respect to overall blockchain size management?

This works as a way to reclaim disk space provided you are starting with the whole block chain, but as presented, there is no way for one node to convey that stubbed tree to another node along with the assurance that only spent transactions have been removed.  If I run a node that prunes and stubs off a transaction showing I spent some coins, and then send you that pruned block, my spent coins look unspent to you.

Since it's a solution that's only useful to a node with the full block chain, and the real problem we face is more the downloading of the block chain rather than storing it, a solution that requires a full block chain download before anything can be safely pruned doesn't address the problem.

2) How is this proposal better or worse than 8. Simplified Payment Verification in "Bitcoin: A Peer-to-Peer Electronic Cash System" with respect to verifying payments?

That proposal suggests spending the funds and then watching to see if the rest of the network confirms the spend into a block before any useful verification is possible, or freshly receiving the funds while watching new blocks.  The idea discussed in this thread would allow instant verification of the existence of pre-existing funds without having to spend them first or downloading any blocks at all - and is actually not a different proposal, but the same proposal with significant improvements.

Yes but under (1) what happens when you actually try to double spend the funds to me? I can still verify the double spend is in fact a double spend because I have the subsequent block hashes so what is the incentive to convey the block with the previous spend information removed?

Concerned that blockchain bloat will lead to centralization? Storing less than 4 GB of data once required the budget of a superpower and a warehouse full of punched cards. https://upload.wikimedia.org/wikipedia/commons/8/87/IBM_card_storage.NARA.jpg https://en.wikipedia.org/wiki/Punched_card
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1073


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
July 11, 2012, 11:49:37 PM
 #142

Yes but under (1) what happens when you actually try to double spend the funds to me? I can still verify the double spend is in fact a double spend because I have the subsequent block hashes so what is the incentive to convey the block with the previous spend information removed?

The subsequent block hashes don't tell you whether or not the funds are spent.  The only way you know funds are spent is that you know of a transaction that spends it.  There is no present way to know that certain funds are NOT spent unless you have the whole block chain coming after that transaction.

Remember, the goal is to eliminate a boundless multi-gigabyte download for new users.  The only way to solve that is to remove some information from that data set so it is smaller.  The party receiving the reduced data set can be assured that the data that's there actually belongs there, but he has no way to know whether the missing (pruned) data was actually supposed to be pruned.  This proposal addresses that.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
ArticMine
Legendary
*
Offline Offline

Activity: 2282
Merit: 1046


Monero Core Team


View Profile
July 12, 2012, 12:17:44 AM
 #143

Yes but under (1) what happens when you actually try to double spend the funds to me? I can still verify the double spend is in fact a double spend because I have the subsequent block hashes so what is the incentive to convey the block with the previous spend information removed?

The subsequent block hashes don't tell you whether or not the funds are spent.  The only way you know funds are spent is that you know of a transaction that spends it.  There is no present way to know that certain funds are NOT spent unless you have the whole block chain coming after that transaction.

Remember, the goal is to eliminate a boundless multi-gigabyte download for new users.  The only way to solve that is to remove some information from that data set so it is smaller.  The party receiving the reduced data set can be assured that the data that's there actually belongs there, but he has no way to know whether the missing (pruned) data was actually supposed to be pruned.  This proposal addresses that.

The fact that the correct data is pruned is secured on a second blockchain with merged mining. I can see the point of this for certain applications such as verifying the integrity of a physical Bitcoin without actually opening it up and spending the funds. So there is an advantage in that respect over the proposal in Bitcoin: A Peer-to-Peer Electronic Cash System.

Concerned that blockchain bloat will lead to centralization? Storing less than 4 GB of data once required the budget of a superpower and a warehouse full of punched cards. https://upload.wikimedia.org/wikipedia/commons/8/87/IBM_card_storage.NARA.jpg https://en.wikipedia.org/wiki/Punched_card
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1073


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
July 12, 2012, 12:24:48 AM
 #144

The fact that the correct data is pruned is secured on a second blockchain with merged mining. I can see the point of this for certain applications such as verifying the integrity of a physical Bitcoin without actually opening it up and spending the funds. So there is an advantage in that respect over the proposal in Bitcoin: A Peer-to-Peer Electronic Cash System.

It's also useful for instantly screening an incoming unconfirmed transaction as being "good pending confirmation" versus "totally bogus and no chance of confirmation" without needing a block chain at all.

It's also useful not just for physical bitcoins, but if people start printing disposable bitcoin cash from their printer. (example: user clicks File -> Print Money to "be their own bank" instead of driving to an ATM and paying an ATM fee - something I see as wildly compatible with the average joe.  such self-printed bills would have the same requirements as physical bitcoins). 

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
iain
Jr. Member
*
Offline Offline

Activity: 33
Merit: 7



View Profile WWW
July 12, 2012, 01:51:43 AM
Last edit: July 12, 2012, 02:18:07 AM by iain
 #145

Apologies if this is a stupid question, but... would it not also be of benefit to make available, by the same sort of methods, a record of all the spent txouts? I'm thinking of how things can go when a future light client is querying (in untrusting style) a full client node - perhaps in exchange for a micropayment, or a subscription or whatever - about whether various txouts are spent or unspent. Here's an anthropomorphized dialogue. (Obviously the number of rounds back and forth is higher than it needs to be, just for the sake of anthropomorphizing the exchange.)

Light client (LC): hi, is this txout (.....) spent or unspent?
Full client (FC): unspent.
LC: ...and your proof? (I don't trust you, remember!)
FC: here's the merkle chain [or similar thing for whatever the data structure is exactly] leading from your txout to the root hash, which you can acquire from the network with suitably impressive [merged-]mining effort associated therewith. (transmits a pleasantly short merkle chain)
LC: ok, thanks very much! now, what about this txout: spent or unspent?
FC: spent.
LC: ...and your proof?
FC: well, uh, the proof is the fact that I'm not sending you a merkle chain like I did with the unspent one!
LC: sorry, but that's not a proof from my point of view!

With a record of spent txouts, the FC can send a merkle chain convincing the untrusting LC of a txout's spent status, rather than having to say "trust my absence-of-proof-of-unspent to mean it's spent".

(Maybe this is overkill? The FC could just send the transaction which spends the given txout. But maybe that was a transaction that "died" (failed to be bedded down in the winning blockchain) long ago, and for the FC to send it is misleading. Admittedly, the main reason for such "dyings" is losing a double-spending battle, in which case the relevant txout is likely still in fact spent, just not by the transaction the FC is advertising. But one can imagine scenarios where the LC wants proof not just of spent status, but of where it went, and a suitable data structure could support a reply of "it's spent, and here's the tx it went into, and here's a pleasantly short merkle chain establishing this".)
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1073


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
July 12, 2012, 02:32:31 AM
 #146

It would be a bit different than that. Rather than saying "spent" it would say and prove "not unspent".

The way it would do this is by sending the leaf nodes immediately surrounding where the unspent tx would go IF it existed, along with Merkle lineage. This would be like me proving to you there are no "fockheads" in the phone book by sending you the pages immediately surrounding "fockhead" alphabetically (e.g. Freitag thru Foster). This assumes everything is kept in a sorted order, and that is what is proposed in this thread.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
ArticMine
Legendary
*
Offline Offline

Activity: 2282
Merit: 1046


Monero Core Team


View Profile
July 12, 2012, 03:57:25 AM
 #147


It's also useful for instantly screening an incoming unconfirmed transaction as being "good pending confirmation" versus "totally bogus and no chance of confirmation" without needing a block chain at all.

It's also useful not just for physical bitcoins, but if people start printing disposable bitcoin cash from their printer. (example: user clicks File -> Print Money to "be their own bank" instead of driving to an ATM and paying an ATM fee - something I see as wildly compatible with the average joe.  such self-printed bills would have the same requirements as physical bitcoins). 

Basically any situation where one needs to prove what the balance is in a particular bitcoin address without actually spending the coins in that address.

Concerned that blockchain bloat will lead to centralization? Storing less than 4 GB of data once required the budget of a superpower and a warehouse full of punched cards. https://upload.wikimedia.org/wikipedia/commons/8/87/IBM_card_storage.NARA.jpg https://en.wikipedia.org/wiki/Punched_card
iain
Jr. Member
*
Offline Offline

Activity: 33
Merit: 7



View Profile WWW
July 12, 2012, 07:15:11 AM
 #148

It would be a bit different than that. Rather than saying "spent" it would say and prove "not unspent".

The way it would do this is by sending the leaf nodes immediately surrounding where the unspent tx would go IF it existed, along with Merkle lineage. This would be like me proving to you there are no "fockheads" in the phone book by sending you the pages immediately surrounding "fockhead" alphabetically (e.g. Freitag thru Foster). This assumes everything is kept in a sorted order, and that is what is proposed in this thread.


Ah, of course! Thanks for the explanation.

Perhaps a record of spent txouts would still be of some value, to cover the case where a light client wants to query a full node with "prove to me how (i.e. into what tx) this txout was spent"? (But then again, maybe the need (if any) for such a query type would be sufficiently rare and specialized that running a full node oneself can be deemed the reasonable way to meet such an "expert" need.)
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1006


View Profile
July 12, 2012, 03:41:38 PM
 #149

Without being too snarky, we have such an index: it's called the block chain. A full node would send the transaction in which it was spent, the block header that transaction was included in, and the path through the transaction Merkle-tree linking the two.

Now in the far, distant future it might be useful to have an index of block hashes so that the lite node doesn't even have to keep track of that information, but right now the overhead of maintaining that tree vs the storage cost (about ~1.25MB per year) doesn't make sense.

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
iain
Jr. Member
*
Offline Offline

Activity: 33
Merit: 7



View Profile WWW
July 14, 2012, 06:51:05 PM
 #150

Without being too snarky, we have such an index: it's called the block chain. A full node would send the transaction in which it was spent, the block header that transaction was included in, and the path through the transaction Merkle-tree linking the two.

Now in the far, distant future it might be useful to have an index of block hashes so that the lite node doesn't even have to keep track of that information, but right now the overhead of maintaining that tree vs the storage cost (about ~1.25MB per year) doesn't make sense.

Yes indeed, you're quite right, my worry was groundless. Well then, this is really exciting, the road to a secure lightweight client is open and clear!
mp420
Hero Member
*****
Offline Offline

Activity: 501
Merit: 500


View Profile
July 26, 2012, 10:38:40 AM
 #151

I tried to read the OP again and I have a question. I apologize for the fact that I'm not very Bitcoin-literate and I may be asking things that are obvious.

When a lite node is trying to check the balance of a particular address, it needs to have downloaded the (pruned) alt-chain and every block of the main chain that have been included in the blockchain since the last alt-chain block.

Or is there a zero-trust way for a full node to assure the lite-node that a TxOut hasn't been spent since the last altchain block that does not require the lite node to ever download full primary-chain blocks?

Of course if we drop the zero-trust requirement, it's trivial to set up a third party service that does the validation for the lite clients up to the last alt-chain block.
Maged
Legendary
*
Offline Offline

Activity: 1204
Merit: 1006


View Profile
July 26, 2012, 12:10:04 PM
 #152

When a lite node is trying to check the balance of a particular address, it needs to have downloaded the (pruned) alt-chain and every block of the main chain that have been included in the blockchain since the last alt-chain block.
Pretty much. That being said, keep in mind that this would allow someone to become a full node without much trust while ONLY having to download that much.

maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1006


View Profile
July 26, 2012, 01:46:31 PM
 #153

When a lite node is trying to check the balance of a particular address, it needs to have downloaded the (pruned) alt-chain and every block of the main chain that have been included in the blockchain since the last alt-chain block.
Pretty much. That being said, keep in mind that this would allow someone to become a full node without much trust while ONLY having to download that much.
No, the lite client need only download the alt-chain headers since the last checkpoint, and then can request a path through the Merkle-tree to the unspent TxOut (or the surrounding outputs, if it has in fact been spent).

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1073


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
July 26, 2012, 04:24:06 PM
 #154

When a lite node is trying to check the balance of a particular address, it needs to have downloaded the (pruned) alt-chain and every block of the main chain that have been included in the blockchain since the last alt-chain block.
Pretty much. That being said, keep in mind that this would allow someone to become a full node without much trust while ONLY having to download that much.

When a lite node is trying to check the balance of a particular address, it needs to have downloaded the (pruned) alt-chain and every block of the main chain that have been included in the blockchain since the last alt-chain block.
Pretty much. That being said, keep in mind that this would allow someone to become a full node without much trust while ONLY having to download that much.
No, the lite client need only download the alt-chain headers since the last checkpoint, and then can request a path through the Merkle-tree to the unspent TxOut (or the surrounding outputs, if it has in fact been spent).

Between the above two conflicting answers, the bottom one is the one I consider to be the more correct one.

Neither answer is false, but as I understand it, the way the OP wants to structure the tree, it would be possible to both prove or disprove the existence of funds with a simple query to someone else, and trust it with nothing more than the hash of the latest block.  Trusting that latest hash would typically require downloading the block headers, but not the entire blocks themselves.  What Maged says you can do, you can do, but what Maaku says you can also do, I believe you can also do.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
allten
Sr. Member
****
Offline Offline

Activity: 450
Merit: 250


You Don't Bitcoin 'till You Mint Coin


View Profile WWW
July 27, 2012, 04:58:32 AM
 #155

Still trying to understand if pruning the block chain is a good idea. I need a little help understanding.
One of the premises of bitcoin is that we only need to trust math and cryptography. So, anyone can download
the entire block chain and verify all the signatures, hashes, etc. and verify that it 100% complies with the math, cryptography, and bitcoin's protocols; however,
is this still possible once transaction with spent outputs are removed? It would seem that you would have to start putting trust in the mining community that
there wasn't a mass conspiracy to give themselves bitcoins.

Sorry, I really want to understand the detailed technicals so please have patience with my lack of understanding.
I'll do my best to understand if someone will explain it.

I can definitely see pruned/compressed blockchains would be beneficial for many applications, but it worries me that it would be the norm for everyone.

Thanks to all working on what I also see as one of the more pressing issues with bitcoin right now.
socrates1024
Full Member
***
Offline Offline

Activity: 125
Merit: 104


Andrew Miller


View Profile
July 27, 2012, 05:44:41 AM
 #156

"Pruning the trees" is a red herring. Here is another way of explaining what this proposal is about.

Bitcoin already supports a Zero-Trust validation technique that is also 100% pruned - it only requires storing a single O(1) block hash. The problem is that it's blatantly inefficient. What we are doing with Merkle trees is altering the blockchain to make this technique, which is inherently fully pruned and zero-trust, more efficient.

[Inefficient O(N), Zero-trust, 100%-pruned technique] (It's just a counter example, bear with me) The entire bitcoin blockchain is already represented by the current block hash. Every client, even the lightest of light clients, stores the current block hash. If you know your current block hash, then you can't be fooled about any of the data in the chain. For example, if you want to check that a particular txoutput hasn't been spent, you iterate backwards through the blocks all the way to that transaction, checking that there's no previous transaction that spends it.

This isn't efficient. It literally involves processing the whole damn chain each time you validate a transaction. So, current light clients certainly don't do this - instead they have to trust a helper to validate that the transactions aren't double-spends. Full nodes have an easier time because they have the storage available to maintain an indexed database of unspent txouts.

[Merkle trees O(log N), Zero-trust, 100%-pruned] What we are all proposing in various ways is to alter how the blockchain history is represented in the current block hash. In addition to the previous block data, we will also include a 'snapshot' of the unspent txoutputs database in each block hash. This snapshot is the root of a Merkle tree, which gives us little shortcuts so we can validate a transaction much more efficiently, O(log N) per validation.

amiller on freenode / 19G6VFcV1qZJxe3Swn28xz3F8gDKTznwEM
[my twitter] [research@umd]
I study Merkle trees, credit networks, and Byzantine Consensus algorithms.
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1006


View Profile
July 27, 2012, 06:57:02 AM
 #157

Neither answer is false, but as I understand it, the way the OP wants to structure the tree, it would be possible to both prove or disprove the existence of funds with a simple query to someone else, and trust it with nothing more than the hash of the latest block.  Trusting that latest hash would typically require downloading the block headers, but not the entire blocks themselves.  What Maged says you can do, you can do, but what Maaku says you can also do, I believe you can also do.
Yes, I should have been more explicit. I interpreted the original question as “what is the minimum necessary action that needs to be taken by a client to verify the status of a TxOut”. Maged's solution is perfectly valid and would result in the correct answer, but would also be more work for the client than is strictly necessary.

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
mp420
Hero Member
*****
Offline Offline

Activity: 501
Merit: 500


View Profile
July 27, 2012, 01:02:37 PM
 #158

Thanks, I think I understand it now, the Merkle tree is arranged so that there's always a path from the latest hash to each unspent-TxOut, so only headers are required.

I think this idea is very clever and elegant. Of course the devil is in the implementation details. I'd very much like to see this implemented.
BrightAnarchist
Donator
Legendary
*
Offline Offline

Activity: 853
Merit: 1000



View Profile
July 27, 2012, 03:12:24 PM
 #159

I'd very much like to see this implemented.

Same here. So far more than 20 coins have been pledged  to the person or persons that implement this: https://bitcointalk.org/index.php?topic=93606 Hopefully more pledges soon...
allten
Sr. Member
****
Offline Offline

Activity: 450
Merit: 250


You Don't Bitcoin 'till You Mint Coin


View Profile WWW
July 27, 2012, 06:21:41 PM
 #160

Before I read this I just want to quickly post that I personally, no matter whether justifiably or unjustifiably, I personally feel like this is the most pressing issue when it comes to Bitcoin's successful future and I really hope the core team has an order of priorities planed accordingly.

I too believe this is a critical issue for Bitcoin, as a whole.  I had floated the idea that handling blockchain size was critical, in the past, but other issues seemed more pressing for the devs at the time -- I didn't have a solid idea to promote, and the blockchain size wasn't so out of hand yet.

One nice benefit of this solution is that because it's an alt-chain, technically no core devs have to be on-board.  It can be done completely indepedently and operate completely non-disruptively, even with only the support of other devs who believe in it.  I'd certainly like to get core devs interested in it, as they are very smart people who probably have a lot of good ideas to add.  But one of the biggest upsides here is that it can be done completely indepdently.

Read some more of your proposal today and I now have a better idea of how it works and what you are accomplishing. Good proposal BTW.
I'm still very concerned that this development will make it so the full blockchain database will not be as accessible for download.
Its important for Bitcoin that anyone can fully audit the blockchain and only need to trust in math and cryptography and not the miners (even though it would be an extremely low probability that enough miners would conspire to get coins in the compressed blockchain).

I completely agree that there are many applications and situations that a compressed blockchain would be extremely useful. Please update us on your perspective of the balance between systems that should have the full block chain and ones that should use the compressed version. The way I read the OP proposal and probably because of my fears is that this compressed form is to replace the full block chain in all instances.


jimbobway
Legendary
*
Offline Offline

Activity: 1304
Merit: 1014



View Profile
July 27, 2012, 09:48:06 PM
 #161

Before I read this I just want to quickly post that I personally, no matter whether justifiably or unjustifiably, I personally feel like this is the most pressing issue when it comes to Bitcoin's successful future and I really hope the core team has an order of priorities planed accordingly.

I too believe this is a critical issue for Bitcoin, as a whole.  I had floated the idea that handling blockchain size was critical, in the past, but other issues seemed more pressing for the devs at the time -- I didn't have a solid idea to promote, and the blockchain size wasn't so out of hand yet.

One nice benefit of this solution is that because it's an alt-chain, technically no core devs have to be on-board.  It can be done completely indepedently and operate completely non-disruptively, even with only the support of other devs who believe in it.  I'd certainly like to get core devs interested in it, as they are very smart people who probably have a lot of good ideas to add.  But one of the biggest upsides here is that it can be done completely indepdently.

Read some more of your proposal today and I now have a better idea of how it works and what you are accomplishing. Good proposal BTW.
I'm still very concerned that this development will make it so the full blockchain database will not be as accessible for download.
Its important for Bitcoin that anyone can fully audit the blockchain and only need to trust in math and cryptography and not the miners (even though it would be an extremely low probability that enough miners would conspire to get coins in the compressed blockchain).

I completely agree that there are many applications and situations that a compressed blockchain would be extremely useful. Please update us on your perspective of the balance between systems that should have the full block chain and ones that should use the compressed version. The way I read the OP proposal and probably because of my fears is that this compressed form is to replace the full block chain in all instances.




The transaction cost just needs to be increased so it serves as an incentive for miners to support the main block chain...I think.
ripper234
Legendary
*
Offline Offline

Activity: 1358
Merit: 1002


Ron Gross


View Profile WWW
July 28, 2012, 12:08:52 PM
 #162

Finally had the time to parse and understand the first message in this thread.

+1 for the initiative, it's a good addition to Bitcoin.

Does this information appear in the wiki somewhere?
Does anyone care to TL;DR the rest of the thread for me?

Is there a bounty jar for this? (We can use Booster.io to open one)

Please do not pm me, use ron@bitcoin.org.il instead
Mastercoin Executive Director
Co-founder of the Israeli Bitcoin Association
etotheipi
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1048


Core Armory Developer


View Profile WWW
July 28, 2012, 02:28:24 PM
 #163

FYI, I have been too swamped with work-work and keeping Armory from breaking under the increased blockchain load to spend too much time on this specific proposal.  I have not lost interest by any means, I just need to catch my breath after some big deadlines.  Then I'll be taking some time off to work explicitly on some Bitcoin stuff... including this proposal.

I'm going to spend some time in the near future looking at space efficiency of a couple variants of the trie data-structure.  I'm not sure exactly how this theoretical datastructure can be merged with a disk-based DB engine (I imagine that what I have in mind is not used by an existing, acceptable DB engine) but maybe there's a way to make a hybrid.  This is a problem that still needs to be resolved before we can move forward with an implementation:  Once we agree on a datastructure, how do we use it but avoid re-inventing the wheel in terms of robust, scalable disk-based database engines?

The more I've been thinking about it, the more I have become convinced that a trie-like structure is necessary.  Note only are query and insert times are constant, the determinism of tree structure for a given set of tree nodes means that queries and inserts can be parallelized.  For instance, the tree could be implemented with the first layer (the first 256 nodes of a 256-way trie) split into a different files/DBs/processes/servers for each branch.  Then every new piece can be distributed and queued for its particular destination.  It could even be distributed amongst different, independent servers.  This seems to be advantageous.

For reference, I learned of this particular tree in my data structures class in college, but I find no reference that anyone else has ever heard of it.  In my class it was called a "De La Brandia Tree/Trie".  A Patricia tree is a level-compressed trie.  A "de la brandia tree" is a Patricia tree that uses a linked-list of pointers, instead of a constant-size array.  i.e. -- in a 256-way trie or patricia tree, each branch node contains 256 pointers (8 bytes each), which could point to other nodes.  However, the sparseness of lower-level nodes means that the nodes will frequently have 255 null pointers, with only one relevant pointer.  The de-la-brandia tree will represent all child pointers with an ordered linked list.  It has some extra overhead for nodes that have mostly-full child lists, but it I think such near-full nodes will be tiny compared to the amount of space saved on sparse nodes.

When I get some more time, I'll make some pictures, and update the original post with the updated proposal.  

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
Serith
Sr. Member
****
Offline Offline

Activity: 269
Merit: 250


View Profile
July 28, 2012, 03:39:46 PM
Last edit: July 28, 2012, 04:06:57 PM by Serith
 #164

For reference, I learned of this particular tree in my data structures class in college, but I find no reference that anyone else has ever heard of it.  In my class it was called a "De La Brandia Tree/Trie".

Did you notice that over the last 2-3 years google search gradually became really smart, it feels like there is a full scale AI behind it. I found the data structure that you were looking for, it's called "De la Briandais" tree.
jimbobway
Legendary
*
Offline Offline

Activity: 1304
Merit: 1014



View Profile
July 28, 2012, 04:01:59 PM
 #165

Not sure if this helps:

http://wiki.postgresql.org/wiki/IndexingXMLData#Patricia_Trie
http://stackoverflow.com/questions/355051/how-do-you-store-a-trie-in-a-relational-database

etotheipi
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1048


Core Armory Developer


View Profile WWW
July 28, 2012, 07:01:02 PM
 #166

For reference, I learned of this particular tree in my data structures class in college, but I find no reference that anyone else has ever heard of it.  In my class it was called a "De La Brandia Tree/Trie".

Did you notice that over the last 2-3 years google search gradually became really smart, it feels like there is a full scale AI behind it. I found the data structure that you were looking for, it's called "De la Briandais" tree.

Oh, if I had stopped typing so fast into google, I probably would've notice the autocompletion answer.  Interesting that I always thought it "brandia" instead of "brandais".  That's what I get for never going to class... (though I did stay up all night debugging the insert function for a Patricia tree).

Speaking of that, when I do a search for the correct name, I get a lot of links to the exact class I took when I attended UIUC:
http://www.cs.uiuc.edu/class/fa05/cs225/cs225/_notes/_section/cs225ta4/Documents/bsttries.pdf

It's not the most exhaustive introduction to the trie strcutures, but if you are already familiar with the concepts, you can get the gist of it.  And in fact, I was really proposing the Patricia/Brandais hybrid tree.  A pure "Brandais" tree uses linked lists but is not level-compressed.

The linked list may also make it easier to combine children to produce the "hash" of a particular node:  You only need to concatenate the non-null children hashes, which will frequently be very few elements (and the "skip string").  And in those cases, you just hash consecutively through the linked list.  And the dense nodes near the top can be cached for when they need to be recalculated.  This will also reduce the amount of data that needs to be transmitted to communicate a branch of the tree to a node (though, it's probably still more than I originally estimated). 

Unfortunately, my understanding of the correct path forward once these structures need to move from RAM to disk is beyond me. 

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
allten
Sr. Member
****
Offline Offline

Activity: 450
Merit: 250


You Don't Bitcoin 'till You Mint Coin


View Profile WWW
July 28, 2012, 08:03:06 PM
 #167

Before I read this I just want to quickly post that I personally, no matter whether justifiably or unjustifiably, I personally feel like this is the most pressing issue when it comes to Bitcoin's successful future and I really hope the core team has an order of priorities planed accordingly.

I too believe this is a critical issue for Bitcoin, as a whole.  I had floated the idea that handling blockchain size was critical, in the past, but other issues seemed more pressing for the devs at the time -- I didn't have a solid idea to promote, and the blockchain size wasn't so out of hand yet.

One nice benefit of this solution is that because it's an alt-chain, technically no core devs have to be on-board.  It can be done completely indepedently and operate completely non-disruptively, even with only the support of other devs who believe in it.  I'd certainly like to get core devs interested in it, as they are very smart people who probably have a lot of good ideas to add.  But one of the biggest upsides here is that it can be done completely indepdently.

Read some more of your proposal today and I now have a better idea of how it works and what you are accomplishing. Good proposal BTW.
I'm still very concerned that this development will make it so the full blockchain database will not be as accessible for download.
Its important for Bitcoin that anyone can fully audit the blockchain and only need to trust in math and cryptography and not the miners (even though it would be an extremely low probability that enough miners would conspire to get coins in the compressed blockchain).

I completely agree that there are many applications and situations that a compressed blockchain would be extremely useful. Please update us on your perspective of the balance between systems that should have the full block chain and ones that should use the compressed version. The way I read the OP proposal and probably because of my fears is that this compressed form is to replace the full block chain in all instances.




The transaction cost just needs to be increased so it serves as an incentive for miners to support the main block chain...I think.

yes, that is the correct answer. I'm just concerned that miners will be convinced that they are "supporting" the block chain with only the compressed/pruned version on their disk drive.
That is why I'm asking etotheipi to clarify how this will be presented to the miner community. That is, its a tool for the miners and everyone else, but the miners should still support the full blockchain.
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1073


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
July 28, 2012, 09:54:36 PM
 #168

yes, that is the correct answer. I'm just concerned that miners will be convinced that they are "supporting" the block chain with only the compressed/pruned version on their disk drive.
That is why I'm asking etotheipi to clarify how this will be presented to the miner community. That is, its a tool for the miners and everyone else, but the miners should still support the full blockchain.

The way I understand it, the miners will be supporting the block chain, regardless of whether they have part of it or all of it.

The only thing a miner is supporting at any given time is the hash of the latest block and the new transactions he is adding into his block.  Nothing more.  All of the history is covered simply by reference to the prior block hash.  Whether or not he has a local copy of the first billion rolls of Satoshi Dice on his hard drive is irrelevant toward his ability to mine.

I don't worry for one bit that the original unabridged block chain will ever go extinct.  Enough people care about it, the cost to maintain it is low, all it takes is one historian to seed it for the rest of everyone else and everyone who wants it will have it.

The way I see it, the only real critical reason one should demand full blocks all the way to point-in-time X is to maximize the probability that he is not being fed an attack fork without enough information to detect it.  A reasonable hunch of what a good value of X might be for the average client might be a week, and for a miner, a few months.  It could be argued someone investing in a serious mining operation (like a pool) arguably "needs" more assurance, but someone running a serious mining operation also likely has the skills to determine for himself whether he has the correct block chain and that assurance is arguably just as good as having more block history.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
ripper234
Legendary
*
Offline Offline

Activity: 1358
Merit: 1002


Ron Gross


View Profile WWW
July 29, 2012, 05:22:30 AM
 #169

FYI, I opened a bounty jar for this project at booster.io. Please add a link to the OP.

Quote
Ultimate Blockchain Pruning is a proposed alt-chain data structure that will enhance core Bitcoin scalability and allow for trust-free light clients. It does not compete with Bitcoin, but rather complements and strengthens it.

This bounty will be awarded to the first person or group who completes all these tasks:
1. Implement UBP
2. Get at least 15% of the hash power to merge-mine it
3. Patch at least one major Bitcoin client to support UBP mode
4. Benchmark the result and show an improvement of at least 10% in downloading the blockchain from scratch

This is quite an undertaking ... so you better donate if you want to encourage this idea.

Please do not pm me, use ron@bitcoin.org.il instead
Mastercoin Executive Director
Co-founder of the Israeli Bitcoin Association
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1006


View Profile
July 29, 2012, 06:30:56 AM
Last edit: July 29, 2012, 07:47:35 PM by maaku
 #170

FYI, I opened a bounty jar for this project at booster.io. Please add a link to the OP.
Sent 2.5BTC.

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
btharper
Sr. Member
****
Offline Offline

Activity: 389
Merit: 250



View Profile
July 29, 2012, 10:40:03 PM
 #171

Since the alt-chain can only update as often as the main chain, would using a different difficulty mechanism make sense? Of course adherer or not merged mining is used matters.

For example: Instead of saying that the first hash below difficulty wins, is there any way to say that the absolute lowest hash wins? The biggest issue I can see without an obvious answer is how to keep the chain from backpedaling if someone releases a better block N-1 while everyone else is working on block N , but it might just be a variant on the 51% attack.

TL;DR - would a different difficulty mechanism be warranted?
maaku
Legendary
*
expert
Offline Offline

Activity: 905
Merit: 1006


View Profile
July 29, 2012, 11:19:45 PM
 #172

No, it'll work as-is. The alt-chain mints blocks independently of the main-chain (excepting merged-mined blocks where they are both minted at the same time).

I'm an independent developer working on bitcoin-core, making my living off community donations.
If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
btharper
Sr. Member
****
Offline Offline

Activity: 389
Merit: 250



View Profile
July 30, 2012, 01:04:28 AM
Last edit: July 30, 2012, 03:34:33 AM by btharper
 #173

No, it'll work as-is. The alt-chain mints blocks independently of the main-chain (excepting merged-mined blocks where they are both minted at the same time).
Everything would work as is, but this chain wouldn't work quite the same I don't think. Since this alt-chain essentially needs to have the same number of blocks as the primary bitcoin chain. If the alt-chain catches up there's no incentive, or value, in mining anything else on it, which otherwise "wastes" the workers that are participating on the chain.

My main point is that linking the block count to the main chain one-to-one changes things somewhat. Or is this not as big of an issue as I'm thinking it is?

Edit: Fixed typos made while using my phone.
gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 3570
Merit: 5742



View Profile
July 30, 2012, 01:54:13 AM
 #174

The alt-chain mints blocks independently of the main-chain

An alt-"chain" is probably the wrong way to think of this.  It's committed data. There doesn't need to be a chain.  Each bitcoin block could have 0 or 1 tree commitments of a given type.
btharper
Sr. Member
****
Offline Offline

Activity: 389
Merit: 250



View Profile
July 30, 2012, 03:44:31 AM
 #175

The alt-chain mints blocks independently of the main-chain

An alt-"chain" is probably the wrong way to think of this.  It's committed data. There doesn't need to be a chain.  Each bitcoin block could have 0 or 1 tree commitments of a given type.

The chain has established itself as a good proof-of-work system, which is the largest reason to stick with it that I can see right off hand. However it may run more like P2Pool in that storing old blocks (other than headers) may be slightly useless, or at least contrary to the goal of eliminating extra data. Setting up the alt-chain with either "none" or "some" updates may be the way to go to preserve simplicity however compared to normal chains. For one of the datastructure guys, would there be an advantage in marking updates as "a few" vs "many" in terms of packing the data? Maybe something else worth looking into for someone who knows how the current setup would work.

As a much more random aside, any idea what the alt-chain coins could be used for? and if transfer would be worthwhile. If transfer doesn't matter just give 1 coin per block and let people sign with their keys how many they've accumulated helping to secure the chain for lite nodes.  Smiley
DiThi
Full Member
***
Offline Offline

Activity: 156
Merit: 100

Firstbits: 1dithi


View Profile
July 30, 2012, 12:03:57 PM
 #176

The "alt-chain" term confuses people. It's more like a "sub-chain" of the main blockchain. It doesn't generate any coins at all, and it's a temporal fix until the majority of miners support it.

1DiThiTXZpNmmoGF2dTfSku3EWGsWHCjwt
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1073


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
July 30, 2012, 04:07:18 PM
 #177

The alt-chain mints blocks independently of the main-chain

An alt-"chain" is probably the wrong way to think of this.  It's committed data. There doesn't need to be a chain.  Each bitcoin block could have 0 or 1 tree commitments of a given type.

+100

The term that makes the most sense for me is a "meta-tree".  It would never be "mined" - it would simply be committed to in normal blocks - optionally - at least until it is proven to work in practice and a decision is made to make it a mandatory part of the protocol.


Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 3570
Merit: 5742



View Profile
July 31, 2012, 03:44:42 PM
 #178

One recent revelation I've had as a result of Pieter's ultraprune implementation is that any tree commitment scheme should also commit to undo logs so that nodes don't necessarily have to all individually store all the data required to reorg forever.
etotheipi
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1048


Core Armory Developer


View Profile WWW
July 31, 2012, 06:10:57 PM
 #179

One recent revelation I've had as a result of Pieter's ultraprune implementation is that any tree commitment scheme should also commit to undo logs so that nodes don't necessarily have to all individually store all the data required to reorg forever.

Perhaps that's another reason to use insert-order-invariant tree structure.  If you have to reverse some transactions, you don't have to worry how you undo them.  Undoing a block/tx is just as easy as adding it in the first place.

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 3570
Merit: 5742



View Profile
July 31, 2012, 08:28:12 PM
 #180

Perhaps that's another reason to use insert-order-invariant tree structure.  If you have to reverse some transactions, you don't have to worry how you undo them.  Undoing a block/tx is just as easy as adding it in the first place.

You still have to have the complete data that you would remove. E.g. when I spend txn X,  I don't specify all of X's data to spend it (thats burried elsewhere in the chain) only the hash. Order invariant wouldn't let me recover that. I need some kind of undo data, even if its just the location of the original txn so that I could fetch it. (though more efficient if you can serve me up a whole undo block)
etotheipi
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1048


Core Armory Developer


View Profile WWW
July 31, 2012, 08:32:12 PM
Last edit: August 01, 2012, 09:50:41 AM by etotheipi
 #181

Perhaps that's another reason to use insert-order-invariant tree structure.  If you have to reverse some transactions, you don't have to worry how you undo them.  Undoing a block/tx is just as easy as adding it in the first place.

You still have to have the complete data that you would remove. E.g. when I spend txn X,  I don't specify all of X's data to spend it (thats burried elsewhere in the chain) only the hash. Order invariant wouldn't let me recover that. I need some kind of undo data, even if its just the location of the original txn so that I could fetch it. (though more efficient if you can serve me up a whole undo block)

Yeah, I actually realized that and was editing my previous post to say this:

It's not actually trivial to reverse blocks in any particular pure-pruned scheme, since adding blocks involves deleting UTXOs.  So reversing the block means you have to re-add UTXOs that are not defined by the block you are reversing (it only references the OutPoint by hash:index, not the whole UTXO).  So you have to either save them or request them from another node.  Perhaps the solution is to keep a circular buffer of the last N UTXOs that were removed, as long as N is enough to cover the last, say 50 blocks.  Any time you use map.delete when updating the tree, you use a buffer.add to save it (which will also discard the oldest element in the buffer).  Then when you need to reverse a tx, you know the OutPoints that need to be re-added, and if N is large enough, you'll have it in the buffer.  Worst case, you have to fetch some data from another node, which isn't terrible.


Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
casascius
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1073


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
July 31, 2012, 09:13:22 PM
 #182

It's probably fodder for another topic, but I am of the opinion that in the event of a very large reorg (where more than 6 blocks are rolled back), the proper behavior of a bitcoin client should be to stop functioning and demand that a new client be downloaded from the developer(s) of that client, and to assist the user in exporting their wallet and ensuring their replacement client is properly signed by the developer.  This is based on philosophizing that it would be better for the bitcoin network to go down in an orderly fashion in the event of an attack - long enough for developers to agree on countermeasures tailored to the attack - just like the recent space rocket that changed its mind and aborted the launch at the last second - rather than for it to stay up and operate chaotically and at a financial loss to users.

Mentioning that is not intended to derail the thread - I am certain there isn't a consensus on what I just said, and it may very well be an unacceptable idea.

But I am saying it because: If it were ever to be debated and determined that what I threw out IS in fact a good idea, it would also settle the question as to how to ensure the tree can be rolled back.  The answer would be simple: at a minimum, keep a copy of the tree as it looked at the point representing the maximum amount we're willing to roll back without ceasing to function.


Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
jgarzik
Legendary
*
qt
Offline Offline

Activity: 1596
Merit: 1022


View Profile
August 01, 2012, 12:27:12 AM
 #183

It's probably fodder for another topic, but I am of the opinion that in the event of a very large reorg (where more than 6 blocks are rolled back), the proper behavior of a bitcoin client should be to stop functioning and demand that a new client be downloaded from the developer(s) of that client, and to assist the user in exporting their wallet and ensuring their replacement client is properly signed by the developer.  This is based on philosophizing that it would be better for the bitcoin network to go down in an orderly fashion in the event of an attack - long enough for developers to agree on countermeasures tailored to the attack - just like the recent space rocket that changed its mind and aborted the launch at the last second - rather than for it to stay up and operate chaotically and at a financial loss to users.

During the output overflow incident, we were all rather glad that the collective client base did not do this.  Once miners upgraded, everyone reorg'd back into working order.

This is just one of many reasons why there is a 120-block new-bitcoin confirmation policy in place.


Jeff Garzik, Bloq CEO, former bitcoin core dev team; opinions are my own.
Visit bloq.com / metronome.io
Donations / tip jar: 1BrufViLKnSWtuWGkryPsKsxonV2NQ7Tcj
etotheipi
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1048


Core Armory Developer


View Profile WWW
August 01, 2012, 09:39:48 AM
Last edit: August 01, 2012, 09:51:36 AM by etotheipi
 #184

But I am saying it because: If it were ever to be debated and determined that what I threw out IS in fact a good idea, it would also settle the question as to how to ensure the tree can be rolled back.  The answer would be simple: at a minimum, keep a copy of the tree as it looked at the point representing the maximum amount we're willing to roll back without ceasing to function.

I think it's an interesting idea, and as you suggested, I don't want to derail the thread.  However, I think that the amount of history to maintain is not a critical question.  Standard reorgs on the main chain are rarely more than 1 block.  We save enough info for 10 blocks, and if a reorg happens further back than that (and the client is willing to continue), then the node can just request the missing information from peers.  It's all verifiable information, and if you are trying to catch up to the longest chain there should be plenty of peers who can supply it.


Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
socrates1024
Full Member
***
Offline Offline

Activity: 125
Merit: 104


Andrew Miller


View Profile
August 03, 2012, 09:56:25 PM
 #185

It's probably fodder for another topic, but I am of the opinion that in the event of a very large reorg (where more than 6 blocks are rolled back), the proper behavior of a bitcoin client should be to stop functioning and demand that a new client be downloaded from the developer(s) of that client, and to assist the user in exporting their wallet and ensuring their replacement client is properly signed by the developer.  This is based on philosophizing that it would be better for the bitcoin network to go down in an orderly fashion in the event of an attack - long enough for developers to agree on countermeasures tailored to the attack - just like the recent space rocket that changed its mind and aborted the launch at the last second - rather than for it to stay up and operate chaotically and at a financial loss to users.

Mentioning that is not intended to derail the thread - I am certain there isn't a consensus on what I just said, and it may very well be an unacceptable idea.

But I am saying it because: If it were ever to be debated and determined that what I threw out IS in fact a good idea, it would also settle the question as to how to ensure the tree can be rolled back.  The answer would be simple: at a minimum, keep a copy of the tree as it looked at the point representing the maximum amount we're willing to roll back without ceasing to function.



Perhaps one reason to prefer a balanced binary tree is that if you want to store a snapshot of every previous blockchain, you only have to store the differences. There is a "Persistent Authenticated Datastructure" that would let you handle roll backs efficiently.


(This image is from "Persistent Authenticated Dictionaries and their Applications" by Anagnostopoulos Goodrich and Tamassia http://cs.brown.edu/people/aris/pubs/pad.pdf )

I do not know if this is possible with tries.

amiller on freenode / 19G6VFcV1qZJxe3Swn28xz3F8gDKTznwEM
[my twitter] [research@umd]
I study Merkle trees, credit networks, and Byzantine Consensus algorithms.
etotheipi
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1048


Core Armory Developer


View Profile WWW
August 05, 2012, 07:06:59 PM
 #186

Perhaps one reason to prefer a balanced binary tree is that if you want to store a snapshot of every previous blockchain, you only have to store the differences. There is a "Persistent Authenticated Datastructure" that would let you handle roll backs efficiently.

[Image suppressed]

(This image is from "Persistent Authenticated Dictionaries and their Applications" by Anagnostopoulos Goodrich and Tamassia http://cs.brown.edu/people/aris/pubs/pad.pdf )

I do not know if this is possible with tries.

I'm not 100% sure I understand that picture.  It looks like you are saving the pointer-tree of each state, though there's only ever one copy of each piece of underlying data.  Multiple trees will be maintained, but leaf nodes will point to global copies of TxOut data.   Is that correct?

In that case, saving the state of the tree at a previous time means storing a whole lot of pointer data.  If I consider a basic binary search tree and use 8-byte pointers (assuming it's all in memory), then each node in your binary tree is going to store at least two pointers (prtLeft & ptrRight) and probably a few more bytes for random stuff like isLeaf and isRedOrBlack, etc.  So I assume 20 bytes per tree node.  For a tree of 2 million unspent TxOuts, then that's about 40 MB of storage for each state you want to store (each block).  And that's going to go up linearly with the size of the tree. 


(NOTE: I'm discussing this as if it's a single tree full of TxOuts, even though my proposal is about a main tree full of addresses containing unspent TxOuts, and subtrees of TxOuts.  However, the concepts are still valid, it's just easier to make my points as if it's a single tree of TxOuts).

On the other hand, if you're using a trie, you don't have to save the state of the entire trie since its structure is purely deterministic for a given set of addresses.  You only have to save the nodes that were removed that might have to be re-added later in the event of a rollback.  You look at the OutPoints of the TxIns in each tx of the block that need to be reversed, and re-add them to the trie.  Then remove the ones that were added by those transactions.  This is all standard O(1) trie operations.

Thus, the data that needs to be stored in order to rollback a block in the trie (besides the blockdata itself) is proportional to the transaction volume on the networkAnd transaction volume is capped by network rules.  Currently, each block could be up to 1MB.  If you filled a block full of transactions that were entirely TxIns with nothing else, it would result in removing about 7200 TxOuts from the tree.  That means that that absolute upper limit for storage per block you want to be able to roll back is about 250 kB, regardless of tree size.


Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
socrates1024
Full Member
***
Offline Offline

Activity: 125
Merit: 104


Andrew Miller


View Profile
August 08, 2012, 12:16:45 AM
 #187

I'm not assuming it's all in memory. I'm not talking about pointers, but about a graph of hashes.

To be sure we're on the same page - we are talking about a merkle trie with one hash at the root, and hashes at each level, right?

Also you keep saying O(1) when you mean O(log N). I don't think you would agree with the following statement:
    "Each transaction is associated with a unique hash. There are only 2^256 hashes, therefore the number of transactions is O(1)."

Still, I am leaning towards thinking the simplicity of the trie outweighs any marginal cost benefits of the balanced binary tree. They're of the same order in all operations.

amiller on freenode / 19G6VFcV1qZJxe3Swn28xz3F8gDKTznwEM
[my twitter] [research@umd]
I study Merkle trees, credit networks, and Byzantine Consensus algorithms.
etotheipi
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1048


Core Armory Developer


View Profile WWW
August 08, 2012, 12:42:27 AM
 #188

I'm not assuming it's all in memory. I'm not talking about pointers, but about a graph of hashes.

To be sure we're on the same page - we are talking about a merkle trie with one hash at the root, right?

There does appear to be a disconnect here.  Let me explain what my understanding of your proposal is:

(1) There is a database of TxOuts.  It has all current, unspent TxOuts in it.  TxOuts are added and deleted (or marked deleted) when blocks come in.
(2) The TxOuts need to be organized into a tree of some sort in order to come up with the single "merkle root" to be included in the meta-chain header ("merkle" is in quotes, because using a binary tree or trie it's no longer a "merkle" tree).  The relationship between DB elements (1) will be represented either as a Binary Search Tree or trie using "pointers".
(3) Thus, after every block, you update the database of TxOuts (1), and the pointers in binary search tree or trie (2).  I was focusing on (2) when I say that you need to store pointers.  Not necessarily in RAM... but somewhere, data needs to be held that identifies the structure of the tree/trie associated with the TxOuts in database (1). 
(4) In a classic binary search tree held in RAM, it uses pointers -- specifically a left pointer and a right pointer for each node.  In this context, whether it's held in RAM or on disk, there's still 8 bytes needed (at minimum) per pointer to represent the relationships.  And a binary tree will use about 1 such pointer per node.

This is not a negligible amount of data.   If TxOuts are 35 bytes, pointers are 8 bytes, then a "tree" of pointers will be about 25% of the size of the entire TxOut database.  It's a rough estimate, not intended to be exact.

So now back to my original point:  it looks to me that in order to "save the state" of the tree, you need to save the entire tree of pointers between nodes (2).  And you need the full tree.  Sure the TxOuts database stores each TxOut once, but saving all the pointers to represent the past states of the tree will triple your space requirements to save just 8 blocks.


Also you keep saying O(1) when you mean O(log N). I don't think you would agree with the following statement:
    "Each transaction is associated with a unique hash. There are only 2^256 hashes, therefore the number of transactions is O(1)."

Another disconnect here.  I'm not sure where your example came from, but here's what I'm referring to -- let's compare a binary search tree full of Bitcoin addresses to a trie of Bitcoin addresses (the entire set of all addresses on the network containing unspent TxOuts).  Assume at a given time there are N such addresses.

Binary Search Tree (red-black or any balanced tree) -- depth is O(log(N)).   
   -- Querying an element is O(logN)
   -- Inserting an element is O(logN)
   -- Deleting an element is O(logN) 
Basic trie:  depth of the tree is equal to the length of the longest key:  so the depth is 20 (because addresses are 20 bytes).
   -- Querying an element is O(1) -- it takes 20 hops to get from the root node to the leaf you want
   -- Inserting an element is O(1) -- it takes 20 hops to get from the root node to the leaf you want
   -- Deleting an element is O(1) -- it takes 20 hops to get from the root node to the leaf you want

So even if there are 100 quintillion bitcoin addresses with unspent TxOuts in the master alt-chain tree, it still takes you exactly 20 hops to query, insert or delete a value in the trie (and less if you're using a patricia tree).

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
socrates1024
Full Member
***
Offline Offline

Activity: 125
Merit: 104


Andrew Miller


View Profile
August 08, 2012, 12:46:23 AM
 #189

Could you describe how you would update the root hash for the trie?

Also, it most certainly takes fewer than 40 hops* to get from the root of the balanced binary tree to any leaf. Log N is less than 20.

*Assuming one hop is actually "8 hops". I don't feel like I'm making this point too clear. You can have a 'balanced B tree' that has up to 8 children at each level. Then there are only 20 hops.

Or to put it another way, if there are so many transactions that we have collisions by birthday paradox, then we would need to pick a bigger hash.

amiller on freenode / 19G6VFcV1qZJxe3Swn28xz3F8gDKTznwEM
[my twitter] [research@umd]
I study Merkle trees, credit networks, and Byzantine Consensus algorithms.
etotheipi
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1048


Core Armory Developer


View Profile WWW
August 08, 2012, 01:25:33 AM
Merited by ETFbitcoin (1)
 #190

Could you describe how you would update the root hash for the trie?

In both BSTs and trie structures there will be leaf nodes and branch nodes.  In some structures (such as the BST) branch nodes may also contain leaf data.  In the context of this proposal, the master tree will be the set of all addresses containing any unspent TxOuts.  So a given leaf represents one such address.  

--For the purposes of organizing the tree/trie, that leaf's value will be the 20-byte address.  So the 20-byte address is the "key" of the master tree.
--For the purposes of computing the root hash, that leaf's value is the root of the subtree of unspent TxOuts for that address (which is constructed in a similar way to the address tree)
--For the purposes of computing the root hash, all branch nodes' hash values are the hash of the concatenation of its children's hash values.

In the simplest case, you have only pure branch nodes and pure leaf nodes.   This looks very much like the merkle trees we already use.  Then each node's "hash value" is the hash of the two children "hash values" concatenated.  [LeftPtrNodeValue || RightPtrNodeValue].  You walk up the tree computing this for every node until you get to the root.

In the BST case where nodes are actually both branch nodes and leaf nodes, then you just use hash([LeafValue | LeftPtrNodeValue | RightPtrNodeValue]).  

In the Patricia/Hybrid Tree case, there are purely branch nodes and leaf nodes, though the branch nodes may have "skip strings".  So a leaf node's hash value is just the root hash of the subtree.  And a branch node's value is the hash of the concatenated skip string and its non-null child node values.

When you add or remove a node from the tree, you are changing the hash value of its parent, which changes its parent, etc.  So for a BST, you to add or delete a node, you have to recompute/update O(logN) other nodes hash values to get the new root value.  Same thing for the trie, except there is exactly 20 parents to update, regardless of the size of the tree (and see below for why this will ultimately be only 4-6).

I really need to make a picture...  (coming soon)


It most certainly takes fewer than 40 hops to get from the root of the balanced binary tree to any leaf. Log N is less than 20.

Right, number of hops of a red-black tree will be 2*logN worst case.  So your point is well-received that N will probably never get high enough in this environment for the query/insert/delete time of a basic trie to absolutely beat the binary search tree.  Both of them have completely acceptable query/insert/delete times.  So two things to mention:
(1) It's still entirely accurate to label trie operations as O(1).  It just not be relevant for this application.
(2) I'm actually proposing a Patricia tree, which is level-compressed.  So a typical access time will be 4-6 hops.  Even with trillions of nodes in the tree.  The absolute max is 20 but it would never be realized.

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
ripper234
Legendary
*
Offline Offline

Activity: 1358
Merit: 1002


Ron Gross


View Profile WWW
August 08, 2012, 04:59:37 AM
 #191

Perhaps it's time to formulate this in the wiki?
Maybe as a BIP draft?

Or is it too soon?

Please do not pm me, use ron@bitcoin.org.il instead
Mastercoin Executive Director
Co-founder of the Israeli Bitcoin Association
etotheipi
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1048


Core Armory Developer


View Profile WWW