Using compact indexes instead of hashes as identifiers.

TierNolan

Legendary

Offline

Activity: 1232
Merit: 1083

Re: Using compact indexes instead of hashes as identifiers.

March 11, 2016, 01:05:46 AM

#21

Quote from: jl777 on March 10, 2016, 09:06:58 PM

canonical encoding means a numbering system for each block, tx, vin, vout so that the same number references the same one. Since the blocks are ordered and the tx are ordered within each block and vins and vouts are ordered within each tx, this is a matter of just iterating through the blockchain in a deterministic way.

Is it just a count of outputs, or <block-height | transaction index | output index>?

Quote

I have no idea how this would cause any privacy loss as it is just using 32 bit integers as pointers to the hashes. The privacy issue was raised as somehow a reason to not use efficient encoding.

Ahh ok, I guess it is confusion due to the thread split. I agree, I see no loss in privacy by referring to transactions using historical positions.

With a hard fork, transactions could have both options. If you want to spend a recent transactions, you could refer to the output by hash. For transactions that are buried deeply, you could use the index. With reorgs, the indexes could be invalidated, but that is very low risk for 100+ confirms.

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF

jl777 (OP)

Legendary

Offline

Activity: 1176
Merit: 1132

Re: Using compact indexes instead of hashes as identifiers.

March 11, 2016, 01:25:11 AM

#22

Quote from: TierNolan on March 11, 2016, 01:05:46 AM

Quote from: jl777 on March 10, 2016, 09:06:58 PM

canonical encoding means a numbering system for each block, tx, vin, vout so that the same number references the same one. Since the blocks are ordered and the tx are ordered within each block and vins and vouts are ordered within each tx, this is a matter of just iterating through the blockchain in a deterministic way.

Is it just a count of outputs, or <block-height | transaction index | output index>?

Quote

I have no idea how this would cause any privacy loss as it is just using 32 bit integers as pointers to the hashes. The privacy issue was raised as somehow a reason to not use efficient encoding.

Ahh ok, I guess it is confusion due to the thread split. I agree, I see no loss in privacy by referring to transactions using historical positions.

With a hard fork, transactions could have both options. If you want to spend a recent transactions, you could refer to the output by hash. For transactions that are buried deeply, you could use the index. With reorgs, the indexes could be invalidated, but that is very low risk for 100+ confirms.

I used to use a 32bit index for the entire chain, but that doesnt work for parallel sync, plus in a few years it would actually overflow.

now it is a 32 bit index for txidind, unspentind and spendind, for the txids, vouts and vins within each bundle of 2000 blocks

and an index for the bundle, which is less than 16 bits

so (bundle, txidind) and (bundle, unspentind) and (bundle, spendind) would be the corresponding txid, vout and vin within each bundle

but yes, use hashes for non-permanent data, index for permanent

http://www.digitalcatallaxy.com/report2015.html
100+ page annual report for SuperNET

TierNolan

Legendary

Offline

Activity: 1232
Merit: 1083

Re: Using compact indexes instead of hashes as identifiers.

March 11, 2016, 10:18:12 AM

#23

Quote from: jl777 on March 11, 2016, 01:25:11 AM

but yes, use hashes for non-permanent data, index for permanent

I thought of a possible issue with offline signing. If you want to offline sign a transaction, then you need to include proof that the index refers to a particular transaction output.

Armory already has to include all the inputs into the transaction in order to figure out how much you are spending, so it isn't that big a deal. The problem is that there is no way to prove that a particular index refers to a particular output. You would need to include the header chain and that is only SPV safe.

For offline transactions, using the hash is probably safer, but that doesn't affect internal storage, just if transactions could refer to previous outputs by index.

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF

jl777 (OP)

Legendary

Offline

Activity: 1176
Merit: 1132

Re: Using compact indexes instead of hashes as identifiers.

March 11, 2016, 01:32:19 PM

#24

Quote from: TierNolan on March 11, 2016, 10:18:12 AM

Quote from: jl777 on March 11, 2016, 01:25:11 AM

but yes, use hashes for non-permanent data, index for permanent

I thought of a possible issue with offline signing. If you want to offline sign a transaction, then you need to include proof that the index refers to a particular transaction output.

Armory already has to include all the inputs into the transaction in order to figure out how much you are spending, so it isn't that big a deal. The problem is that there is no way to prove that a particular index refers to a particular output. You would need to include the header chain and that is only SPV safe.

For offline transactions, using the hash is probably safer, but that doesn't affect internal storage, just if transactions could refer to previous outputs by index.

Maybe I didnt make it clear in this thread. The indexes are used for traversing lists, etc., but all the txids are in the dataset. So the dataset has indexes referring to other indexes (or implicitly by their position) and each index can be converted to its fully expanded form. Tastes great and less calories.

So you can do all the verifications, etc. as it is a local DB/rawfile replacement combined into one. Being read-only for the most part mksquashfs can create a halfsized dataset that also acts to protect it from data corruption. I split out the sigs into separate files, so they can be purged after being validated. I am also thinking of doing the same with pubkeys, so nodes that want to be able to search all pubkeys seen could, but nodes that dont wont need to keep it around.

It works as a lossless codec that stores its data in a way that is ready to do searches without any setup time (for the data it already processed). All addresses are fully indexed, even multisig/p2sh so there is no need for importprivkey. Listunspent serially took ~2 milliseconds on a slow laptop. With the parallel datasets, it is well suited for multi-core searching to allow for linear speedups based on number of cores. even GPU could be used for industrial strength speed as long as there is a pipeline of requests to avoid the latency issues of using GPU per RPC

James

http://www.digitalcatallaxy.com/report2015.html
100+ page annual report for SuperNET