maaku
Legendary
Offline
Activity: 905
Merit: 1012
|
|
May 31, 2013, 12:01:53 AM Last edit: May 31, 2013, 12:16:42 AM by maaku |
|
Well it is straightforward to demonstrate to a lightweight client that such other script hashes relate to the same address. A full node could respond to a UTXO query with a the asked-for data and an attached message along the lines of "and hey btw did you know about these scripts using the same pubkey: ...?" attached at the end. The client can verify for itself that indeed, those scripts use the same pubkey or P2SH hash and then query for those outputs as well.
Obviously this isn't ideal, but it might be good enough (you wouldn't need consensus for this data, just be attached to at least 1 honest full node maintaining an index of this information). Or I suppose yet another (3rd!) index can be maintained mapping addresses to scripts actually present in the UTXO set, but as you note that wouldn't be future-proof.
|
I'm an independent developer working on bitcoin-core, making my living off community donations. If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
|
|
|
oakpacific
|
|
June 02, 2013, 12:58:31 PM |
|
Is it possible for a miner to only download full blocks from the last checkpoint and still validate transactions?
|
|
|
|
maaku
Legendary
Offline
Activity: 905
Merit: 1012
|
|
June 02, 2013, 04:11:13 PM |
|
Possible? Yes. Desirable? No. It's important that miners verify that they haven't been duped onto a side chain. It is, however, okay for them to throw away those historical transactions once they have been verified and just keep the UTXO set.
|
I'm an independent developer working on bitcoin-core, making my living off community donations. If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
|
|
|
etotheipi (OP)
Legendary
Offline
Activity: 1428
Merit: 1093
Core Armory Developer
|
|
June 02, 2013, 10:41:13 PM |
|
Well it is straightforward to demonstrate to a lightweight client that such other script hashes relate to the same address. A full node could respond to a UTXO query with a the asked-for data and an attached message along the lines of "and hey btw did you know about these scripts using the same pubkey: ...?" attached at the end. The client can verify for itself that indeed, those scripts use the same pubkey or P2SH hash and then query for those outputs as well.
Obviously this isn't ideal, but it might be good enough (you wouldn't need consensus for this data, just be attached to at least 1 honest full node maintaining an index of this information). Or I suppose yet another (3rd!) index can be maintained mapping addresses to scripts actually present in the UTXO set, but as you note that wouldn't be future-proof.
There's two problems with this logic: (1) Every request is X-fold more work for the serving peer just to catch the 0.01% of addresses that have multiple forms in the blockchain. It has to do multiple lookups for each request. (2) Something like "<message> OP_DROP ..." is a serious problem for this proposal. The task of "find all UTXOs to address/pubkey X, which have a message prefixed to it" requires a full search of the UTXO space. Such scripts lose all benefit of this proposal. In fact, any node using these kinds of scripts will have to use the original full-node-but-pruned logic, unless the extraneous data is totally deterministic/predictable. Number 2 is concerning, because even if nodes somehow know all the messages they are expecting to see, the proofs of existence (or non-existence) are on isolated branches and require quite a bit more data to prove than if they were all clustered on one branch. And if they don't know what the messages (or other data unrelated to addr/pubkey), then the peer might be expected to do the UTXO search. They might end up adding extra metadata to their database just to accommodate these kinds of requests. On the other hand, since address re-use is a bad idea, maybe the argument about isolated branches are irrelevant. I have an idea that is beyond infeasible, but it's good food for thought. Maybe it's useless, but what the hell: For a given script, if it contains a single public key or single hash160 value (how do we define this?), then we use the hash160 value or compute it (from the public key) and prefix that to the script, replacing the value actually in script with a 0xff byte (or something unused). This is okay technically (if it were possible to identify what values will be used as pubkeys/hash160s), because it's technically not a requirement for the lookup key to match the script. The real script will still be stored at the leaf node. So "DUP HASH160 <addr160> EQUALVERIFY CHECKSIG" would be keyed: "<addr160> DUP HASH160 0xFF EQUALVERIFY CHECKSIG". This theoretically solves all the problems at once, because it doesn't really matter what else is in the script, as long as the "controlling address" is first. Then it's trivial to prove inclusion or exclusion of data for a given address. But of course, it's probably intractible to figure out how to reduce arbitrary scripts to "controlling addresses."
|
|
|
|
maaku
Legendary
Offline
Activity: 905
Merit: 1012
|
|
June 03, 2013, 12:13:51 AM |
|
I think one of us is misunderstanding the other. Bitcoin already has code for extracting either pubkeys, hash160(pukey)'s, or hash160(script)'s from arbitrary transactions as they come in. That's how the wallet code knows whether a transaction is yours or not.
What I'm suggesting is that some full nodes remember and index these results for every transaction in the UTXO set, creating a map of hash160 -> script variants. They then expose the ability for other nodes to query this map, or proactively do so if/when they receive a UTXO query.
This hash160 -> script(s) map doesn't need to be deterministic or authenticated in any way. If given a script, anyone can examine it to see that yes, it does indeed have the relevant pubkey, hash160(pubkey), or p2sh embedded within it, and query the authenticated UTXO index to see that yes, there are unspent outputs using this script. Therefore we don't need to solve the possibly intractable, definitely not future-proof problem of figuring out a general way to match arbitrary scripts to “controlling addresses.” We can use bitcoind's current logic, and are free to update that logic on each release.
|
I'm an independent developer working on bitcoin-core, making my living off community donations. If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
|
|
|
etotheipi (OP)
Legendary
Offline
Activity: 1428
Merit: 1093
Core Armory Developer
|
|
June 03, 2013, 12:25:22 AM |
|
I think one of us is misunderstanding the other. Bitcoin already has code for extracting either pubkeys, hash160(pukey)'s, or hash160(script)'s from arbitrary transactions as they come in. That's how the wallet code knows whether a transaction is yours or not.
What I'm suggesting is that some full nodes remember and index these results for every transaction in the UTXO set, creating a map of hash160 -> script variants. They then expose the ability for other nodes to query this map, or proactively do so if/when they receive a UTXO query, and provide a p2p message for gossiping these mappings.
This hash160 -> script(s) map doesn't need to be deterministic or authenticated in any way. If given a script, anyone can examine it to see that yes, it does indeed have the relevant pubkey, hash160(pubkey), or p2sh embedded within it, and query the authenticated UTXO index to see that yes, there are unspent outputs using this script. Therefore we don't need to solve the possibly intractable, definitely not future-proof problem of figuring out a general way to match arbitrary scripts to “controlling addresses.” We can use bitcoind's current logic, and are free to update that logic on each release.
Okay, I was proposing a change to the way the UTXO tree was keyed, for the purposes of having a consistent way to look up balances/UTXOs for hash160 values instead of raw scripts. You are proposing we keep it keyed/indexed by raw script, but the nodes can store their own meta data for those addresses as they see recognizable scripts. It would not be a requirement to serve the entirety of all UTXOs spendable by someone with the private key of a given hash160, but it can still be useful. Is that a correct interpretation of what you're saying? My problem with that is that it isn't determinisitc whether you will be able to find all your coins. Maybe it doesn't matter: "Use standard scripts if you want to be able to find all your coins efficiently. Otherwise, you're on your own." With the way you suggest it: if you get lucky, nodes have the data pre-indexed for you, and have all of it. But you can't count on it and they can't prove whether they supplied you everything. This makes it considerably less useful, and possibly not useful (nodes need to be able to know if they have everything, or they'll do something else that guarantees it). I think ultimately the raw script indexing is the correct answer. I'm just exploring alternatives, that make the multi-script-single-address problem a little more efficient (and reliable).
|
|
|
|
oakpacific
|
|
June 03, 2013, 12:36:19 AM |
|
Possible? Yes. Desirable? No. It's important that miners verify that they haven't been duped onto a side chain. It is, however, okay for them to throw away those historical transactions once they have been verified and just keep the UTXO set.
Yeah, I did not mention the UTXO set because I thought it's obivous. The reason I brought up this is, I believe a lot of us are willing to run a USB miner to secure the network, without generating any noticeable revenue, now that it's out and very power-efficient, the power cost of keeping one running is somehow negligible, but if we have to download and store the rapidly growing full chain, the cost may grow significantly.
|
|
|
|
maaku
Legendary
Offline
Activity: 905
Merit: 1012
|
|
June 03, 2013, 01:21:21 AM |
|
I propose that we keep the UTXO index keyed by hash160(scriptPubKey) - for different reasons, primarily for the benefits of a constant key size. That would make things more difficult for your proposal. But yes, other than that we are interpreting each other correctly now. My problem with that is that it isn't determinisitc whether you will be able to find all your coins. Maybe it doesn't matter: "Use standard scripts if you want to be able to find all your coins efficiently. Otherwise, you're on your own." With the way you suggest it: if you get lucky, nodes have the data pre-indexed for you, and have all of it. But you can't count on it and they can't prove whether they supplied you everything. This makes it considerably less useful, and possibly not useful (nodes need to be able to know if they have everything, or they'll do something else that guarantees it). I pretty much agree with you, except the last point. It wouldn't be something you'd want to rely upon, but it would be better than nothing, and perhaps there would be valuable use cases when connected to a trusted node. My hesitation is with specifying a general algorithm for identifying addresses/hashes to prefix scripts with. One that is deterministic and future-proof, so that we can use it in an authenticated data structure. Possible? Yes. Desirable? No. It's important that miners verify that they haven't been duped onto a side chain. It is, however, okay for them to throw away those historical transactions once they have been verified and just keep the UTXO set.
Yeah, I did not mention the UTXO set because I thought it's obivous. The reason I brought up this is, I believe a lot of us are willing to run a USB miner to secure the network, without generating any noticeable revenue, now that it's out and very power-efficient, the power cost of keeping one running is somehow negligible, but if we have to download and store the rapidly growing full chain, the cost may grow significantly. The miner could validate the entire history or synchronize with constant storage requirements, throwing away data as it is no longer required.
|
I'm an independent developer working on bitcoin-core, making my living off community donations. If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
|
|
|
etotheipi (OP)
Legendary
Offline
Activity: 1428
Merit: 1093
Core Armory Developer
|
|
June 03, 2013, 01:33:56 AM |
|
I propose that we keep the UTXO index keyed by hash160(scriptPubKey) - for different reasons, primarily for the benefits of a constant key size. That would make things more difficult for your proposal. But yes, other than that we are interpreting each other correctly now.
If we're going to use "essentially" raw script, we'll save quite a bit of space by making it the key of the tree, then we don't have to store the actual script at the leaf nodes. I am blanking on all the other data that needs to be stored, but it might be considerable savings (I haven't thought about this in a while). Well, it would save exactly 20 bytes per leaf. Right now I think there are 6 million UTXOs, so that would be 120 MB of savings. The constant keysize would be important in a trie, but in a level-compressed PATRICIA-like tree, it shouldnt' make a difference.
|
|
|
|
maaku
Legendary
Offline
Activity: 905
Merit: 1012
|
|
June 03, 2013, 02:09:37 AM |
|
Well, that's a reasoned argument I can listen to. Might I suggest then, that the key format be the following <20 bytes> idx:var_int <...n-bytes...>
Where idx is the index in the range of [0, n] specifying where the prefixed 20 bytes should be spliced into the script. This step is skipped for scripts/keys of 20 bytes or less. The only issue then is an unambiguous algorithm for selecting which 20-bytes to pull out for the prefix.
|
I'm an independent developer working on bitcoin-core, making my living off community donations. If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
|
|
|
etotheipi (OP)
Legendary
Offline
Activity: 1428
Merit: 1093
Core Armory Developer
|
|
June 03, 2013, 02:13:32 AM |
|
Well, that's a reasoned argument I can listen to. Might I suggest then, that the key format be the following <20 bytes> idx:var_int <...n-bytes...>
Where idx is the index in the range of [0, n] specifying where the prefixed 20 bytes should be spliced into the script. This step is skipped for scripts/keys of 20 bytes or less. The only issue then is an unambiguous algorithm for selecting which 20-bytes to pull out for the prefix. Hah, you just jumped all the way to the other side. I was actually just suggesting that if we're not going to do anything fancy with the scripts, we should use the exact script as the key instead of its hash. That way we don't have to store the script itself in the key and value of the leaf node (even though actually only the hash160(rawscript) would be in the key).
|
|
|
|
maaku
Legendary
Offline
Activity: 905
Merit: 1012
|
|
June 03, 2013, 04:21:59 AM |
|
Oh, my conservative position is still that prefixing the index key is the wrong way to solve this problem, but I'm willing to explore the idea.
|
I'm an independent developer working on bitcoin-core, making my living off community donations. If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
|
|
|
ThomasV
Legendary
Offline
Activity: 1896
Merit: 1353
|
|
June 03, 2013, 12:30:03 PM |
|
I believe this proposal is of primary importance for Electrum. I started to work on it a few weeks ago, in order to add it to Electrum servers. I finalized two preliminary implementations during this week-end. I am pretty agnostic concerning the choice of keys; I guess hash160(rawscript) makes sense. I would like to make the following point: It is possible to compute node hashes much faster if you store the hash of a node at the key of its parent. That way, it is not necessary to perform database requests to all children when only one child is updated. In order to do that, it is necessary to keep a list of children pointers at each node; this list uses a bit more space (20 bytes/node). Thus, each node stores a list of pointers (20 bytes), and a variable-length list of hash:sumvalue for its children I made two separate implementations: - a "plain vanilla" version without pointers, where a node's hash is stored at the node; this implementation was too slow to be practical. - a faster version that stores node hashes at their parent, and keeps a list of pointers for each node. both versions are available in on github, in the Electrum server code: https://github.com/spesmilo/electrum-server(look for branches "hashtree" and "hashtree2") both branches were tested with randomly generated blockchain reorgs, and they produced the same root hash. I could run the "hashtree2" version for 184k blocks on my vps, and another user went over 200k using a faster machine, but it still took him more than 24h. I am currently working on a third version, that will use write batches when computing the hashes; I hope to further accelerate it that way.
|
Electrum: the convenience of a web wallet, without the risks
|
|
|
maaku
Legendary
Offline
Activity: 905
Merit: 1012
|
|
June 03, 2013, 03:57:20 PM |
|
Am I reading the source code correctly that you are doing a standard Merkle-list for the UTXO tree? I couldn't find anything that looked like balanced tree updates. I'd think that's the root of your inefficiency right there - PATRICIA trees are a big part of this proposal.
You are right that this impacts Electrum significantly. We should coordinate our efforts.
|
I'm an independent developer working on bitcoin-core, making my living off community donations. If you like my work, please consider donating yourself: 13snZ4ZyCzaL7358SmgvHGC9AxskqumNxP
|
|
|
ThomasV
Legendary
Offline
Activity: 1896
Merit: 1353
|
|
June 03, 2013, 04:15:32 PM |
|
Am I reading the source code correctly that you are doing a standard Merkle-list for the UTXO tree? I couldn't find anything that looked like balanced tree updates. I'd think that's the root of your inefficiency right there - PATRICIA trees are a big part of this proposal.
I use a PATRICIA tree for addresses, and a simple list for UTXOs that belong to the same address. I remember discussing this question on IRC, we were not sure if it was better to store UTXOs as database entries or to use addresses for the leaves of the tree (what I do) note that if we use a patricia tree of UTXOs, we might end up doing more database queries for the hashes; what makes you think it would be less efficient?
|
Electrum: the convenience of a web wallet, without the risks
|
|
|
ShadowOfHarbringer
Legendary
Offline
Activity: 1470
Merit: 1006
Bringing Legendary Har® to you since 1952
|
|
June 05, 2013, 10:13:26 AM |
|
* ShadowOfHarbringer is watching this.
|
|
|
|
jtimon
Legendary
Offline
Activity: 1372
Merit: 1002
|
|
June 06, 2013, 11:48:38 AM |
|
I've been thinking that if the indexes are put directly in each main-chain block AND miners include a "signature" demonstrating computational integrity of all transaction validations in the chain, new full nodes only need to download the last block and are completely safe!!! I wonder if block N signature's can be combined somehow with N+1 block transactions... Does anybody know what's Ben's nick in the forum?
|
|
|
|
d'aniel
|
|
June 09, 2013, 11:22:03 PM |
|
I've been thinking that if the indexes are put directly in each main-chain block AND miners include a "signature" demonstrating computational integrity of all transaction validations in the chain, new full nodes only need to download the last block and are completely safe!!! I wonder if block N signature's can be combined somehow with N+1 block transactions... Does anybody know what's Ben's nick in the forum? After watching that video I can't help but think, with my very limited understanding of it, that SCIP combined with appropriate Bitcoin protocol changes (perhaps like, as you mentioned, localizing the full state in the blockchain using an authenticated UTXO tree) will be able to remove most of the reproduction of work necessary by the network that it currently must do in order to operate trust free, as well as make it possible to shard across untrusted peers the operation of combining new transactions into Merkle trees to produce new block headers for miners to work on. These would mean the network could remain perfectly decentralized at ridiculously high transaction rates (the work done per node would, I think in theory, scale as O(M log(M) / N), where M is the transaction rate, and N is the total number of network nodes). This might even mean an always-on zerocoin is feasible (always-on is important so that the anonymity set is maximal, and its users aren't a persecutable (relative) minority). Anybody with a better understanding of SCIP and its applicability to Bitcoin able to pour cold water on these thoughts for me?
|
|
|
|
etotheipi (OP)
Legendary
Offline
Activity: 1428
Merit: 1093
Core Armory Developer
|
|
June 09, 2013, 11:51:53 PM Last edit: June 10, 2013, 02:13:25 AM by etotheipi |
|
I don't have any real understanding of SCIP, but I did talk to the guys behind it, at the conference. They are very excited about their research, and it clearly is quite powerful if it works. However, they did say that it is extremely complicated, and even if it does work, it may have a tough time getting confidence from any security-conscious community due to its complexity. I imagine it will need years of fielding in order for it to actually become an option for any important application.
And of course, I have my doubts that it really works. It sounds too good to be true, but admittedly, I haven't had time to try to understand it at the technical level yet. One day I'll try to find some time to dig into it. But until then, we certainly can't count on it being available for the Reiner-tree.
I'd be extremely interested to see someone with the correct background dig into it and provide a technical overview of how it works.
|
|
|
|
Peter Todd
Legendary
Offline
Activity: 1120
Merit: 1164
|
|
June 10, 2013, 12:19:24 AM |
|
After watching that video I can't help but think, with my very limited understanding of it, that SCIP combined with appropriate Bitcoin protocol changes (perhaps like, as you mentioned, localizing the full state in the blockchain using an authenticated UTXO tree) will be able to remove most of the reproduction of work necessary by the network that it currently must do in order to operate trust free, as well as make it possible to shard across untrusted peers the operation of combining new transactions into Merkle trees to produce new block headers for miners to work on. These would mean the network could remain perfectly decentralized at ridiculously high transaction rates (the work done per node would, I think in theory, scale as O(M log(M) / N), where M is the transaction rate, and N is the total number of network nodes). This might even mean an always-on zerocoin is feasible (always-on is important so that the anonymity set is maximal, and its users aren't a persecutable (relative) minority).
Anybody with a better understanding of SCIP and its applicability to Bitcoin able to pour cold water on these thoughts for me?
You're actually quite correct. It solves the censorship problem too because the "upper levels" of this merkle tree of transactions are still cheap to validate so mining itself remains cheap. You do have issues where someone may create an imbalanced tree - the validation rules will need to have the levels of the merkle tree be sorted - but the work required to imbalance the tree increases exponentially. To be exact, it will be a patricia/radix tree rather than a merkle tree. However SCIP is probably years away from getting to the point where we could use it in the Bitcoin core. One big issue is that a SCIP proof for a validated merkle tree has to be recursive so you need to create a SCIP proof that you ran a program that correctly validated a SCIP proof. Creating those recursive proofs is extremely expensive; gmaxwell can talk more but his rough estimates would be we'd have to hire a big fraction of Amazon EC2 and assemble a cluster of machines with hundreds of terrabytes of ram. But math gets better over time so there is hope. A non-SCIP approach that we can do now would be to use fraud detection with punishment. Peers assemble some part of the merkle tree and digitally sign that they have done so honestly with an identity. (a communitive accumulator is another possibility) The tree is probabalisticly validated, and any detected fraud is punished somehow, perhaps by destroying a fidelity bond that the peer holds. You still need some level of global consensus so the act of destroying a bond is meaningful of course, and there are a lot of tricky details to get right, but the rough idea is plausible with the cryptography available to us now.
|
|
|
|
|