Bitcoin Forum
February 20, 2019, 08:45:02 AM *
News: Latest Bitcoin Core release: 0.17.1 [Torrent]
 
   Home   Help Search Login Register More  
Poll
Question: Should spin-offs be launched with a "claim by" time limit?
Yes.
Yes, as long as the deadline is sufficiently far into the future.
No.
All of the above.
None of the above.

Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 [17] 18 19 20 21 22 23 24 25 26 »
  Print  
Author Topic: Spin-offs: bootstrap an altcoin with a btc-blockchain-based initial distribution  (Read 53204 times)
Peter R
Legendary
*
Offline Offline

Activity: 1162
Merit: 1007



View Profile
June 09, 2014, 06:10:33 PM
Last edit: June 09, 2014, 10:01:24 PM by Peter R
 #321

Refining the format of snapshot.bin

DeathAndTaxes showed that over 99.9% of unspent outputs (by count) remain claimable if the snapshot.bin file is a uniform list of {{hash1, balance1}…{hashN, balanceN}}.  This file format works for Simplified Claim Verification (SCV) and Full Claim Verification (FCV).  

I updated the file format proposal for snapshot.bin below.  I know D&T discussed further compression, but I'd propose that this be done as part of the client installation procedure or during snapshot.bin transport, rather than further compressing snapshot.bin itself.  For this reason, I typically used 64-bit fields even when smaller data types would be work.  

I continued to break snapshot.bin into Sections based on output type.  This isn't strictly necessary as pubKeys and redeemScripts are all hashed to a consistent bit length, but I thought it might still be useful.  

Code:
// File header:
version (1 byte)  blockheight_of_snapshot (8 bytes) blockheight_when_live (8 bytes)  num_bytes_H (8 bytes)  header data   // total of num_bytes_H of header data

// File body:

// Section A: Wealth data for pay2PubKeyHash (sorted by hash):
tagA=0x4E (1 byte)  num_bytes_A (8 bytes)  pubkeyhash_1 (20 bytes)  balance_1 (8 bytes)  …  pubkeyhash_N (20 bytes)  balance_N (8 bytes)  // total of num_bytes_A of records here

// Section B: Wealth data for pay2PubKey (sorted by hash):
tagB=0xB2 (1 byte)  num_bytes_B (8 bytes)  hash_of_pubkey_1 (20 bytes)  balance_1 (8 bytes)  …  hash_of_pubkey_N (20 bytes)  balance_N (8 bytes)  // total of num_bytes_B of records here

// Section C: Wealth data for native multisig (sorted by hash):
tagC=0xA7 (1 byte)  num_bytes_C (8 bytes)  hash_of_pubkeysInCanonicalOrder_1 (20 bytes) balance_1 (8 bytes)  …  hash_of_pubkeysInCanonicalOrder_N (20 bytes)  balance_N (8 bytes)  // total of num_bytes_C of records here

// Section D: Wealth data for pay2ScriptHash (sorted by hash):
tagD=0x24 (1 byte)  num_bytes_D (8 bytes)  hash_of_redeemScript_1 (20 bytes)  balance_1 (8 bytes)  …  hash_of_redeemScript_N (20 bytes)  balance_N (8 bytes)  // total of num_bytes_D of records here

// Section E: Wealth data for other types of outputs:
tagE=0xB4 (1 byte)  num_bytes_E (8 bytes)  should we record the unspent outputs that don't fit into Sections A - D verbatim here, just in case?


The file header for snapshot.bin needs to contain the blockheader of the snapshot block in order for SCV to work.  I also propose that it contains every blockheader between the snapshot and the agreed-upon block where the coin "goes live."  This acts as a time stamp that proves to the world that the developers couldn't have possibly begun mining earlier than the date when, e.g., Block 314,000 was found.

Code:
// File header:
version (1 byte)  blockheight_snapshot (8 bytes)   blockheight_whenLive (8 bytes)  num_bytes_H (8 bytes)  header data   // total of num_bytes_H of header data

// Header data:
Blockheader_for_snapshotBlock (80 bytes)  Blockheader_for_snapshotBlock+1 (80 bytes) … Blockheader_for_whenLiveBlock-1 (80 bytes)   Blockheader_for_whenLiveBlock (80 bytes)

Run Bitcoin Unlimited (www.bitcoinunlimited.info)
1550652302
Hero Member
*
Offline Offline

Posts: 1550652302

View Profile Personal Message (Offline)

Ignore
1550652302
Reply with quote  #2

1550652302
Report to moderator
Your Bitcoin transactions
The Ultimate Bitcoin mixer
made truly anonymous.
with an advanced technology.
Mix coins
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.
1550652302
Hero Member
*
Offline Offline

Posts: 1550652302

View Profile Personal Message (Offline)

Ignore
1550652302
Reply with quote  #2

1550652302
Report to moderator
DeathAndTaxes
Donator
Legendary
*
Offline Offline

Activity: 1218
Merit: 1005


Gerald Davis


View Profile
June 09, 2014, 08:22:57 PM
Last edit: June 09, 2014, 11:48:24 PM by DeathAndTaxes
 #322

A couple corrections:  

In the description for Section C, drop the word "addresses" as there is no address to represent a complex output script (other than P2SH).  You also need to add an element to represent "m" (the number of signatures), "n" can be implicitly determined by the number of PubKeys.   This is pretty easy to achieve as native multisig (not be confused w/ P2SH) is limited to m<=2 and n <= 3 resulting in only five combinations possible (1 of 1, 1 of 2, 1 of 3, 2 of 2, 2 of 3).

Pay2PubKey and native multisig outputs use PubKeys not PubKeyHashes and they are 33 or 65 bytes not 20 bytes.  Only ScriptHashes and PubKeyHashes are 20 bytes.  You could either handle this by treating compressed (33 bytes ea) and uncompressed (65 bytes ea) PubKeys as a separate type ("tag" in your format) or by just parsing the first byte (0x2 or 0x3 = compressed = 32 bytes follow, 0x4 = uncompressed = 64 bytes follow).  The good news is that 99.9%+ of spendable outputs fit one of 7 standard templates.  By also having an unknown type and standard for storing arbitrary scripts (type 0 = unknown) you can store all possible claimants.

Something like this:

Code:
Type Description               UXTO Template                                                             Record_Length Storage_Format
-----------------------------------------------------------------------------------------------------------------------------------------
  0  Unknown                   -none-                                                                    variable      0x00 <scriptLen:2> <Script:scriptLen> <value:8>
  1  Pay2PubkeyHash            OP_DUP OP_HASH160  <PubKeyHash:20> OP_EQUALVERIFY OP_CHECKSIG             29 bytes      0x01 <PubKeyHash:20> <value:8>
  2  Pay2PubKey                <PubKey:33|65> OP_CHECKSIG                                                42|74 bytes   0x02 <PubKey: 33|65> <value:8>
  3  Pay2ScriptHash (P2SH)     OP_HASH160 <ScriptHash:20> OP_EQUAL                                       29 bytes      0x03 <ScriptHash: 20> <value:8>
  4  Native Multisig (1 of 1)  OP_1 <PubKey_0:33|65> OP_1 OP_CHECK_MULTISIG                              0x08 <PubKey:33|65> <value:8>
  5  Native Multisig (1 of 2)  OP_1 <PubKey_0:33|65> <PubKey_1:33|65> OP_2 OP_CHECK_MULTISIG             0x04 <PubKey:33|65> <PubKey:33|65> <value:8>
  6  Native Multisig (2 of 2)  OP_2 <PubKey_0:33|65> <PubKey_1:33|65> OP_2 OP_CHECK_MULTISIG             0x05 <PubKey:33|65> <PubKey:33|65> <value:8>
  7  Native Multisig (1 of 3)  OP_1 <PubKey_0:33|65> <PubKey_1:33|65> <PubKey2> OP_3 OP_CHECK_MULTISIG   0x06 <PubKey:33|65> <PubKey:33|65> <PubKey:33|65> <value:8>
  8  Native Multisig (2 of 3)  OP_2 <PubKey_0:33|65> <PubKey_1:33|65> <PubKey2> OP_3 OP_CHECK_MULTISIG   0x07 <PubKey:33|65> <PubKey:33|65> <PubKey:33|65> <value:8>

A couple notes
1) Yes 1 of 1 "multisig" is a valid output.  It can probably be combined with Pay2PubKey entries as the input format of both are identical.
2) It may be better to break the types down to separate uncompressed and compressed.  With multisig they can be mixed so there are multiple combinations (i.e. for x of 3: UUU, UUC, UCC, CCC).
3) Scripts are limited to 10,000 bytes so for type 0 that puts an upper bound on the length of unknown scripts.  In practice scripts are much smaller on average.
4) Unknown (type 0) could be broken out to identify more discrete templates.

All unspendable outputs can be dropped.  Outputs matching the following templates are unspendable (either by design or error) and can be dropped from the snapshot.bin.  There is no reason to keep them anymore than spent output but it doesn't make much difference as there are less than 2K outputs currently in the blockchain.
Code:
<Unknown:32>
<Unknown:36>
OP_DUP OP_HASH160 OP_0 OP_EQUALVERIFY OP_CHECKSIG
OP_IFDUP OP_IF OP_2SWAP OP_VERIFY OP_2OVER OP_DEPTH
OP_RETURN
OP_RETURN <Payload:0-40>


An alternative idea is to recognize that complex scripts (including native multisig) have different properties and thus treat them differently.  Pay2PubKeyHash, Pay2PubKey, Pay2ScriptHash make up the majority of the outputs (>99.97% by total created outputs lets see how it holds up by value) and they have a single identifying characteristic (as hash or key).  To normalize them and reduce the size, a hash could be taken of the PubKeys for formats 2 & 4.  This would make all 4 formats (1-4)  representable by a single hash and value.  

All other outputs could just be stored as raw scripts.  While it may not be the most efficient, remember they make up 0.03% of outputs and that will probably shrink due to P2SH, so how you handle the top 4 templates is going to determine the bulk of the snapshot ledger size.

Code:
0x04E <num_entries:4> // single identify claimants (>99.97% of outputs, Pay2PubKeyHash, Hash(Pay2PubKey), Pay2ScriptHash)
<IdentifierHash:20> <value:8>  //all identifiers in sorted order

0xB4 <mum_entries:4> // complex script outputs (to include native multisig - upper limit is ~30,000 entries)
<ScriptLength: 2> <PubKeyScript: ScriptLength>


As a side note it would have been useful if Bitcoin (or any derived coin) had moved all scripting to the tx inputs and kept all outputs as just script hashes (i.e. <ScriptHash:20> vs OP_HASH160 <ScriptHash:20> OP_EQUAL).  It would make transactions smaller by removing excess opcodes, and more importantly by shifting the "weight" of the tx to the input it would reduce the size of the UXTO (which records the set of outputs.  As a side benefit it would make things like snapshot.bin easier and lighter.  Case in point the current UXTO spends ~35 bytes per output.  If it could be reduced to a output hash (20 bytes), and value stored as varint (4 bytes average) it would reduce the size of UXTO by roughly a third.  Bitcoin could move towards a system like that by soft fork (make non P2SH output non-standard after certain point) and the UXTO would be normalized over time as non-P2SH outputs are spent and replaced with P2SH ones.  Just something for future altcoin developers to consider.
Peter R
Legendary
*
Offline Offline

Activity: 1162
Merit: 1007



View Profile
June 09, 2014, 09:57:53 PM
Last edit: June 09, 2014, 10:12:46 PM by Peter R
 #323


Pay2PubKey and native multisig outputs use PubKeys not PubKeyHashes and they are 33 or 65 bytes not 20 bytes.  Only ScriptHashes and PubKeyHashes are 20 bytes.  You could either handle this by treating compressed (33 bytes ea) and uncompressed (65 bytes ea) PubKeys as a separate type ("tag" in your format) or by just parsing the first byte (0x2 or 0x3 = compressed = 32 bytes follow, 0x4 = uncompressed = 64 bytes follow).


Thanks D&T.  I thought the native multi-sig section was incomplete Smiley

One quick question though (before I study your post in more detail): for pay2PubKey, can we not hash the pubKeys down to 20 bytes and only store the hash in the snapshot.bin file?  I actually thought that was your suggestion, but I must have misinterpreted.  I think this should work for both SCV and FCV because in both cases the verifier will be able to extract the pubKey.  If they can look up the pubKey in snapshot.bin, I don't see why they can't look up the pubKey hash instead.  


EDIT: What you describe here is what I was hoping to do for Pay2PubKey (to make it consistent with Pay2PubKeyHash and Pay2ScriptHash):

An alternative idea is to recognize that complex scripts (including native multisig) have different properties and thus treat them differently.  Pay2PubKeyHash, Pay2PubKey, Pay2ScriptHash make up the majority of the outputs (>99.9%) and they have a single identifying characteristic (as hash or key).  To normalize them and reduce the size, a hash could be taken of the  PubKeys for formats 2 & 8 4?.  This would make all 4 formats (1-4)  representable by a single hash and value.   Then store all other outputs as raw scripts. The good news is there are less than 30,000 valid outputs (unspent and spent) which are not Pay2PubKeyHash, Pay2PubKey or Pay2ScriptHash.  This represents the tail end (~1%) of the distribution and w/ P2SH it is likely to get smaller as a % of the UXTO.  Having those templates in a raw form is less of a concern.

Run Bitcoin Unlimited (www.bitcoinunlimited.info)
DeathAndTaxes
Donator
Legendary
*
Offline Offline

Activity: 1218
Merit: 1005


Gerald Davis


View Profile
June 09, 2014, 11:38:38 PM
Last edit: June 10, 2014, 03:24:27 PM by DeathAndTaxes
 #324

One quick question though (before I study your post in more detail): for pay2PubKey, can we not hash the pubKeys down to 20 bytes and only store the hash in the snapshot.bin file?  I actually thought that was your suggestion, but I must have misinterpreted.  I think this should work for both SCV and FCV because in both cases the verifier will be able to extract the pubKey.  If they can look up the pubKey in snapshot.bin, I don't see why they can't look up the pubKey hash instead.

Yes it would be possible, I think I just wasn't clear.  The PubKey is not recorded in the parent tx input, so for SCV some form of PubKey recovery will be needed.  ECDSA does support PubKey recovery, Bitcoin just doesn't use it in transactions (why Satoshi why? Smiley ).  Given a signed bitcoin transaction, the PubKey can be recovered, hashed and compared to the ledger.  The more I think about it there is probably no reason to not do this way (and just store other outputs as raw scripts).

I wouldn't separate out the entries which are a hash of the PubKey pulled from the UXTO from the entries which are a PubKeyHash from the UXTO (i.e. Section A & Section B).

For example the output script for tx 0e3e2357e806b6cdb1f70b54c3a3a17b6714ee1f0e68bebb44a74b1efd512098, output 0 is:
0496b538e853519c726a2c91e61ec11600ae1390813a627c66fb8be7947be63c52da7589379515d 4e0a604f8141781e62294721166bf621e73a82cbf2342c858ee OP_CHECKSIG

If we take the HASH160 (which is RIPEMD-160(SHA-256(PubKey)) we get
119b098e2e980a229e139a9ed01a469e518e6f26

If this is the only output for that PubKey it can be represented in the ledger under "section A" as
119b098e2e980a229e139a9ed01a469e518e6f26 000000012A05F200


In essence we have created a Pay2PubKeyHash entry from each Pay2PubKey output in the UXTO, making Section B redundant.

So two final questions.
1) Does it make sense to store PubKeyHashes and ScriptHashes separately over a more universal IdentifierHash?
2) Does it make sense to decode native multisig outputs into "templates" instead of storing them as raw scripts (Section E)?  They represent only 0.03% of all outputs (hope to get the time to parse and determine % of value this weekend).

There is no right answer just trying to think ahead to what makes sense and what is gained or lost over one option compared to another.  The answers of those questions determine how to update your format.
smooth
Legendary
*
Offline Offline

Activity: 2044
Merit: 1077



View Profile
June 10, 2014, 10:56:00 AM
 #325


And as I said, it is still likely to be questioned if the developer is uniquely "well-prepared" to mine and ends up mining a disproportionate amount because others are unaware of it or not uniquely well-prepapred (sometimes compounded by a low starting difficulty and bad difficulty adjustment algorithms that give a huge advantage to all miners -- but especially developers -- who get in early).


I think our goals in this thread should be to establish the spin-off technique so that any mining shenanigans become transparent (e.g., by promoting the use of the blockhash timestamp in the header file).  Once this is done, I don't see how mining can be controlled any further.  The developers could say they didn't mine upon launch, but they could do so anyways without anyone knowing.  And I still question whether this would even matter.  

Do you disagree?

I'm not sure which part you are asking about. I think the mining marketplace and the developer's role in that marketplace could matter, but I also agree that is not really relevant to this thread.
smooth
Legendary
*
Offline Offline

Activity: 2044
Merit: 1077



View Profile
June 10, 2014, 11:05:59 AM
 #326

But would this be ethical?  It sounds like Smooth might be inclined to say "no."  But like you implied, creating ASICs ahead of time would require a huge investment that might become wortheless if the experiment failed.  

I'm not making an ethical judgement here, I'm just saying that people will view this largely the same as a premine, because it is.

The DRK example keeps coming up because of the alleged extreme instamine of supposedly 50%. You could in some sense do a spin-off and then instamine so many coins (say 50% of the remainder) that that the spun-off coins become insignificant, or at least much less significant.

This has little to do with the spin-off technique though. There are all kinds of crappy things you can do with a coin after the spin off.
DeathAndTaxes
Donator
Legendary
*
Offline Offline

Activity: 1218
Merit: 1005


Gerald Davis


View Profile
June 12, 2014, 08:20:33 PM
Last edit: June 12, 2014, 09:12:48 PM by DeathAndTaxes
 #327

So I started working on a UXTO parser instead of a blockchain parser.  The UXTO contains all the information that is needed to make the snapshot ledger and it is far more compact (~700MB vs 15GB).  The UXTO (contained in chainstate folder = leveldb database) is the set of unspent outputs as of the current block.  To be useful the parser would need to the ability to be created from an arbitrary block but for now I am just working from the current block.

The format is not well documented and rather dense but I am making sense of it.
https://bitcointalk.org/index.php?topic=647198.msg7243116#msg7243116

Working through the UXTO format in the core client, one thing I realized is that even if we don't take a hash of the PubKeys we only need to store 33 bytes even for uncompressed PubKeys.

For example given a PubKey 0496b538e853519c726a2c91e61ec11600ae1390813a627c66fb8be7947be63c52da7589379515d 4e0a604f8141781e62294721166bf621e73a82cbf2342c858ee

The 0x04 prefix indicates it is uncompressed which means pefix byte followed by 32 bytes for x value and 32 bytes for the y value.  The y values can be computed from the x value and the secp256k1 curve so dropping the y value can save us 32 bytes.  This won't be confused with an actual compressed PubKey as they will have a prefix of 0x02 or 0x03.

Store:
0496b538e853519c726a2c91e61ec11600ae1390813a627c66fb8be7947be63c5201

Compute y value:
da7589379515d4e0a604f8141781e62294721166bf621e73a82cbf2342c858ee

Reconstruct Uncompressed PubKey:
0496b538e853519c726a2c91e61ec11600ae1390813a627c66fb8be7947be63c52da7589379515d 4e0a604f8141781e62294721166bf621e73a82cbf2342c858ee
Peter R
Legendary
*
Offline Offline

Activity: 1162
Merit: 1007



View Profile
June 12, 2014, 09:26:09 PM
 #328

So I started working on a UXTO parser instead of a blockchain parser.

Yes, I've been following.  This is great!  I agree that it's probably best for the parser to work off the UXTO set as well.  

Quote
...one thing I realized is that even if we don't take a hash of the PubKeys we only need to store 33 bytes even for uncompressed PubKeys.

You obviously have way more experience than me with the bitcoin technical details, but I thought the reason there were two forms for compressed public keys 02<x> and 03<x> was because there are two possible y values for a given x value.  Aren't you losing a small amount of information this way?  And would there be anything wrong with converting uncompressed public keys into the standard compressed format?


Run Bitcoin Unlimited (www.bitcoinunlimited.info)
DeathAndTaxes
Donator
Legendary
*
Offline Offline

Activity: 1218
Merit: 1005


Gerald Davis


View Profile
June 13, 2014, 01:01:24 AM
 #329

You obviously have way more experience than me with the bitcoin technical details, but I thought the reason there were two forms for compressed public keys 02<x> and 03<x> was because there are two possible y values for a given x value.  Aren't you losing a small amount of information this way?  And would there be anything wrong with converting uncompressed public keys into the standard compressed format?

Good point.  Need to dig deeper.  Actually I am pretty sure if uses a prefix of 0x05 for half the compressed uncompressed keys to make that distinction.  I haven't checked the code yet but I think that is how it handles it (and it may explain the mystery of the 0x05 prefix).  You might be right about just compressing all keys and using a 0x02 and 0x03 prefix.   Off the top of my head I can't think of a reason why that wouldn't work but they didn't go that route in the UXTO and I assume there is a reason for that.  Back to digging into the guts of the code again.  

As a side not, the hilarity of "compressed uncompressed keys" Smiley  Please future altcoin developers just use compressed keys.  Period.  The protocol has no understanding of anything but compressed keys.
DeathAndTaxes
Donator
Legendary
*
Offline Offline

Activity: 1218
Merit: 1005


Gerald Davis


View Profile
June 13, 2014, 05:38:22 AM
Last edit: June 13, 2014, 04:14:04 PM by DeathAndTaxes
 #330

Need some sleep but here is a preview of UXTO Parser.

Code:
E:\Repos\UxtoParser\UxtoParser.Demo\bin\Debug>UxtoParser.Demo.exe decodetx 14015bd586c0c7a28979ca294b114441f23bfc97be17cd6077b9e12e2709fec3
Raw Key:
63c3fe09272ee1b97760cd17be97fc3bf24144114b29ca7989a2c7c086d55b0114

Raw Value:
010c013600946cb2e08075bcbaf157e47bcb67eb2b2339d24268800f514104e6da9c60084b43d28266243c636bcdaf4d8f17b5954e078d2dece7d4659e0dee3419a40b939c24ac813c692a323ca5207a6fb387ffe28e48f706c95dbf46648f210226cb0561011d9045f6371cb09086ba7148d9942328bcf135ccdda921022eac137ab02d826df0af54e92a352945c9892df6cd77f1a7c390fc82c8b0edea53ae8f9d01

Decoded Record
Version: 1
Code: 12 (IsCoinbase?: False  OutZeroSpendable?: False OutOneSpendable?: True BitmaskLen: 1)
Bitmask: 0x01
BlockHeight: 265985

TxId : Index                                                                Value               Type  SLen Script
14015bd586c0c7a28979ca294b114441f23bfc97be17cd6077b9e12e2709fec3:1           6000         PubKeyHash    20 946cb2e08075bcbaf157e47bcb67eb2b2339d242
14015bd586c0c7a28979ca294b114441f23bfc97be17cd6077b9e12e2709fec3:2          12000          RawScript   137 514104e6da9c60084b43d28266243c636bcdaf4d8f17b5954e078d2dece7d4659e0dee3419a40b939c24ac813c692a323ca5207a6fb387ffe28e48f706c95dbf46648561011d9045f6371cb09086ba7148d9942328bcf1dd78cb6edb35ccdda921022eac137ab02d826df0af54e92a352945c9892df6cd77f1a7c390fc82c8b0edea53ae

Code:
E:\Repos\UxtoParser\UxtoParser.Demo\bin\Debug>UxtoParser.Demo.exe decodetx e6556c82129245c5a5bc1e71151588c3c8ef5e86abd41940ad61178f5374a652
Raw Key:
6352a674538f1761ad4019d4ab865eefc8c3881515711ebca5c5459212826c55e6

Raw Value:
010a118edad22b005a25e8152cdeae9419b34595104e874bf93a8f0183a4a7550040d2918f30a7d55e8b7000983b24b1116179e27a838488be7f00415435a335bc5aa1417bcdd8dea10ce725ed7aaf91cd68

Decoded Record
Version: 1
Code: 0x10 (IsCoinbase?: False  OutZeroSpendable?: True OutOneSpendable?: False BitmaskLen: 1)
Bitmask: 0x11
BlockHeight: 305000

TxId : Index                                                                Value               Type  SLen Script
e6556c82129245c5a5bc1e71151588c3c8ef5e86abd41940ad61178f5374a652:0        3662099         PubKeyHash    20 5a25e8152cdeae9419b34595104e874bf93a8f01
e6556c82129245c5a5bc1e71151588c3c8ef5e86abd41940ad61178f5374a652:2        1000003         PubKeyHash    20 40d2918f30a7d55e8b7000983b24b1116179e27a
e6556c82129245c5a5bc1e71151588c3c8ef5e86abd41940ad61178f5374a652:6      120487026         PubKeyHash    20 415435a335bc5aa1417bcdd8dea10ce725ed7aaf
DeathAndTaxes
Donator
Legendary
*
Offline Offline

Activity: 1218
Merit: 1005


Gerald Davis


View Profile
June 13, 2014, 04:57:11 PM
Last edit: June 17, 2014, 11:01:10 PM by DeathAndTaxes
 #331

You obviously have way more experience than me with the bitcoin technical details, but I thought the reason there were two forms for compressed public keys 02<x> and 03<x> was because there are two possible y values for a given x value.  Aren't you losing a small amount of information this way?  And would there be anything wrong with converting uncompressed public keys into the standard compressed format?

So I can confirm that uncompressed keys are stored in compressed format using 0x04 and 0x05 which work the same as 0x02 and 0x03 do for compressed keys to identify even or odd.

Code:
TxId : Index                                                         Block#          Value               Type  SLen Script
0e3e2357e806b6cdb1f70b54c3a3a17b6714ee1f0e68bebb44a74b1efd512098:0        1     5000000000 PubKeyUncompressed    33 0496b538e853519c726a2c91e61ec11600ae1390813a627c66fb8be7947be63c52
9b0fc92260312ce44e74ef369f5c66bbb85848f2eddd5a7a1cde251e54ccfdd5:0        2     5000000000 PubKeyUncompressed    33 057211a824f55b505228e4c3d5194c1fcfaa15a456abdf37f9b9d97a4040afc073

To decode:
If prefix is 0x04, the next 32 bytes are the x value, compute the even y value and append to produce the 65 byte uncompressed key.
If the prefix is 0x05, the next 32 bytes are the x value, compute the odd y value, append, and change prefix byte to 0x04 to produce the 65 byte uncompressed key.

Compressed -> Uncompressed
046b538e853519c726a2c91e61ec11600ae1390813a627c66fb8be7947be63c52 -> 0496b538e853519c726a2c91e61ec11600ae1390813a627c66fb8be7947be63c52da7589379515d4e0a604f8141781e62294721166bf621e73a82cbf2342c858ee
057211a824f55b505228e4c3d5194c1fcfaa15a456abdf37f9b9d97a4040afc073     -> 047211a824f55b505228e4c3d5194c1fcfaa15a456abdf37f9b9d97a4040afc073dee6c89064984f03385237d92167c13e236446b417ab79a0fcae412ae3316b77

DeathAndTaxes
Donator
Legendary
*
Offline Offline

Activity: 1218
Merit: 1005


Gerald Davis


View Profile
June 13, 2014, 10:58:25 PM
 #332

Distribution of unspent outputs (haven't combined entries to the same identity).  RawScript includes "native" multisig.  It hasn't been broken out yet.

Code:
Type                 NumOutputs    OutputValue  AvgScriptLen
------------------------------------------------------------
PubKeyHash           11,236,706    11,018,046            20
PubKey                   41,665     1,844,048            33
ScriptHash               19,222        17,719            20
RawScript                45,160         2,627           121

UXTO Records:    3,297,375
Unique Outputs: 11,342,753
Output Value:   12,882,440 BTC

As expected the PubKeyHash, PubKey, and ScriptHash outputs combined make up 99.98% of the outstanding coins.
Peter R
Legendary
*
Offline Offline

Activity: 1162
Merit: 1007



View Profile
June 14, 2014, 12:26:16 AM
 #333

Distribution of unspent outputs (haven't combined entries to the same identity).  RawScript includes "native" multisig.  It hasn't been broken out yet.

Code:
Type                 NumOutputs    OutputValue  AvgScriptLen
------------------------------------------------------------
PubKeyHash           11,236,706    11,018,046            20
PubKey                   41,665     1,844,048            33
ScriptHash               19,222        17,719            20
RawScript                45,160         2,627           121

UXTO Records:    3,297,375
Unique Outputs: 11,342,753
Output Value:   12,882,440 BTC

As expected the PubKeyHash, PubKey, and ScriptHash outputs combined make up 99.98% of the outstanding coins.

Nice work!  I don't know if anyone's ever compiled this information before, do you?  

I am actually surprised that there's only 17,719 BTC (0.138%) in P2SH outputs.  I would have thought some very large accounts (e.g., funds held by BitStamp) would be secured using multisig; however, this shows that's probably not the case.  

Here's how it looks on a pie chart:


Run Bitcoin Unlimited (www.bitcoinunlimited.info)
DeathAndTaxes
Donator
Legendary
*
Offline Offline

Activity: 1218
Merit: 1005


Gerald Davis


View Profile
June 17, 2014, 11:00:06 PM
Last edit: June 18, 2014, 12:07:36 AM by DeathAndTaxes
 #334

It is somewhat surprising.  I would have imagined that the cold storage of major exchanges would be secured by m of n  P2SH scripts.  We know that none of them do that simply because the total value encumbered by all ScriptHashes & Raw scripts is <0.16% of all BTC.

Haven't had much free time to do more analysis of the blockchain but found an interesting test case in this tx:
https://blockchain.info/tx/5143cf232576ae53e8991ca389334563f14ea7a7c507a3e081fbef2538c84f6e

This tx has (or had as of the snapshot block) 3073 unspent outputs.  It is the largest tx when measured by the number of unspent outputs.
DeathAndTaxes
Donator
Legendary
*
Offline Offline

Activity: 1218
Merit: 1005


Gerald Davis


View Profile
June 30, 2014, 03:55:03 PM
Last edit: June 30, 2014, 11:25:53 PM by DeathAndTaxes
 #335

I have been too busy to do as much with this in the past two weeks but I have made some progress on a UTXO parser.  I believe by using PubKey recovery the storage of the snapshot can be greatly simplified.

PubKeyHash outputs are already in hashed format.
PubKey outputs can be reduced to PubKeyHash. (13 bytes per record savings)
Native Multisig outputs can be reduced to a set of PubKeyHashes and identifier will indicate the number of keys (13 to 39 bytes per record savings).
1-of-1 multisig outputs can be reclassified as PubKeyHash.

This also has a minor additional cost savings in combining outputs of the same key in different formats (i.e. payments to PubKey and PubKeyHash of the same key).  The parser is still a work in progress but it looks like the snapshot would be ~50MB (exclude outputs below dust threshold, hash all keys, store values as varint, use identifier to id template, and store unrecognized raw scripts).  Time to parse (with highly unoptimized debug code) is about 30 minutes.  Optimized it should be significantly faster but I was happy to see first iteration was less than an hour.


Breakdown with standard multisig scripts broken out of raw scripts

Code:
Type           Num_Outputs   Value_Outputs        Length
------------------------------------------------------------
PubKeyHash     11,236,706    11,018,045.72853340      20
PubKey           41,665     1,844,047.73760491      33
ScriptHash         19,222        17,719.13794714      20
NativeMultiSig   44,440            12.91792116     122
RawScript             720*        2,614.26902186      68

UXTO Records:         3,297,375
Unique Outputs:      11,342,753
Output Value (BTC):  12,882,439.7910285
* Of the 720 as of yet unidentified raw scripts there are only 259 with a value of >1 satoshi. 



Breakdown of Native Multisig outputs
Code:
Type                 Num_Outputs   Value_Outputs  Length
------------------------------------------------------------
NativeMultiSig 1Of1          375      0.03712087   37
NativeMultiSig 1Of2       26,771      3.08014357   75
NativeMultiSig 1Of3       17,268      0.31637593  198
NativeMultiSig 2Of2           11      0.02995760   97
NativeMultiSig 2Of3           11      9.35391400  139
NativeMultiSig 3Of3            2      0.10000000  105

Unique identifiers (combine all outputs to the same "address")
Code:
Distribution           NumClaims  TotalClaimValue
-----------------------------------------------------
Total Claims           3,013,999  12,882,439.79102850
Claims <  5.43 bits      737,239           0.16200128
Claims < 10.00 bits      753,238           0.28801118
Claims < 54.30 bits      846,142           3.02013345
Claims <100.00 bits      988,422          12.77187202
 
Claims >=   5.43 bits  2,276,760  12,882,439.62902720
Claims >=  10.00 bits  2,260,761  12,882,439.50301730
Claims >=  54.30 bits  2,167,857  12,882,436.77089500
Claims >= 100.00 bits  2,025,577  12,882,427.01915650

In the above distribution no redundancy check was done.  To further compact the output set one could:
1) organize all multi-sig outputs in the same order (sort by PubKey or PubKeyHash)
2) hash PubKeys and compare against existing PubKeyHashes
3) convert all 1 of 1 multisig to Pay2PubKeyHash and check against existing keys.  
4) Check for uncompressed and compressed versions of the same key
5) Remove obviously unspendable outputs

DeathAndTaxes
Donator
Legendary
*
Offline Offline

Activity: 1218
Merit: 1005


Gerald Davis


View Profile
July 03, 2014, 03:30:09 PM
Last edit: July 03, 2014, 05:12:09 PM by DeathAndTaxes
 #336

I removed zero value outputs, excluded 20,412 provably unspendable outputs, and consolidated outputs to the same signer into a single claim.  Here are some updated distributions (which if you are like me and like stats may find interesting beyond just this concept).  

All stats are as of  block 305,303 but I doubt it has changed significantly through the most recent block.  I would expect the number of non-standard scripts, native-multisig, and PubKey outputs to decline as they are spent and replaced with an increasing number of P2PkH and P2SH outputs which means that snapshots should become more efficient over time.

Code:
Distribution                      Num Claims             Claim Value        
------------------------------------------------------------------------
Valid Claims                      2,983,463      12,879,829.77705084 BTC
Invalid Claims                       16,744           2,610.01397770 BTC
Total Claims                      3,000,207      12,882,439.79102854 BTC

Code:
         Distribution            Num Claims             Claim Value             Pct of Valid        Pct of Total
-----------------------------------------------------------------------------------------------------------------
     0.01 to       0.10 bits        470,881               0.00725863 BTC             0.0000 %            0.0000 %
     0.11 to       1.00 bits        238,560               0.06057174 BTC             0.0000 %            0.0000 %
     1.01 to       5.43 bits        215,689               0.06049534 BTC             0.0000 %            0.0000 %
     5.44 to      10.00 bits         35,806               0.09588866 BTC             0.0000 %            0.0000 %
    10.01 to      54.30 bits         49,355               0.22008615 BTC             0.0000 %            0.0000 %
    54.31 to     100.00 bits        234,955              12.46195347 BTC             0.0001 %            0.0001 %
   100.01 to   1,000.00 bits        447,226             166.14026722 BTC             0.0013 %            0.0013 %
 1,000.01 to  10,000.00 bits        432,314           1,535.80230367 BTC             0.0119 %            0.0119 %
10,000.01 to 100,000.00 bits        541,229          16,031.22170978 BTC             0.1245 %            0.1244 %
     >0.1 BTC to       1 BTC        275,469          96,829.33301374 BTC             0.7518 %            0.7516 %
      >1  BTC to      10 BTC        203,095         572,033.28805785 BTC             4.4413 %            4.4404 %
     >10  BTC to     100 BTC         98,567       3,531,749.15046627 BTC            27.4208 %           27.4152 %
    >100  BTC to   1,000 BTC         13,169       2,997,803.73298225 BTC            23.2752 %           23.2705 %
  >1,000  BTC to  10,000 BTC          1,415       3,111,492.00522526 BTC            24.1579 %           24.1530 %
>10,0000  BTC to 100,000 BTC             97       2,242,337.41607392 BTC            17.4097 %           17.4062 %
>100,000 BTC                              2         309,838.93715729 BTC             2.4056 %            2.4051 %

Code:
          Minimum Claim          Num Claims             Claim Value             Pct of Valid        Pct of Total
-----------------------------------------------------------------------------------------------------------------
                   0.01 bits      2,983,463      12,879,829.77705084 BTC           100.0000 %           99.9797 % (all valid claims)
                   0.10 bits      2,512,582      12,879,829.76979221 BTC           100.0000 %           99.9797 %
                   1.00 bits      2,296,893      12,879,829.70929687 BTC           100.0000 %           99.9797 %
                   5.43 bits      2,263,537      12,879,829.61522062 BTC           100.0000 %           99.9797 % (new dust treshold)
                  10.00 bits      2,247,538      12,879,829.48921072 BTC           100.0000 %           99.9797 % (new minimum fee)
                  54.30 bits      2,154,634      12,879,826.75708845 BTC           100.0000 %           99.9797 % (old dust treshold)
                 100.00 bits      2,012,583      12,879,817.02725725 BTC            99.9999 %           99.9796 % (new minimum fee)
               1,000.00 bits      1,565,357      12,879,650.88699003 BTC            99.9986 %           99.9784 % (1 mBTC)
              10,000.00 bits      1,133,043      12,878,115.08468636 BTC            99.9867 %           99.9664 %
             100,000.00 bits        591,814      12,862,083.86297658 BTC            99.8622 %           99.8420 %
           1,000,000.00 bits        316,345      12,765,254.52996284 BTC            99.1104 %           99.0903 % (1 BTC)
          10,000,000.00 bits        113,250      12,193,221.24190499 BTC            94.6691 %           94.6499 % (10 BTC)
         100,000,000.00 bits         14,683       8,661,472.09143872 BTC            67.2483 %           67.2347 % (100 BTC)
       1,000,000,000.00 bits          1,514       5,663,668.35845647 BTC            43.9732 %           43.9643 % (1,000 BTC)
      10,000,000,000.00 bits             99       2,552,176.35323121 BTC            19.8153 %           19.8113 % (10,000 BTC)
     100,000,000,000.00 bits              2         309,838.93715729 BTC             2.4056 %            2.4051 % (100,000 BTC)
   1,000,000,000,000.00 bits              0               0.00000000 BTC             0.0000 %            0.0000 % (1,00,000 BTC)

A large portion of the "custom scripts" can be refactored to be a standard template or they are invalid.  I now have only 20 scripts which are potentially valid and can't be reclassified.  Another interesting thing is there are a large number of native-multisig outputs which have one or more invalid PubKeys.  This allows them to be refactored.   I parsed all multisig outputs and removed invalid pubkeys.  If the number of valid keys is less than m in m-of-n then the output is unspendable.  If the number of valid keys is reduced to 1 and the script is 1-of-n then it can be refactored to a Pay2PubKeyHash script.  I haven't hashed PubKeys (to convert them to PubKeyHashes) yet.  Doing so may allow further consolidation of the output set (for keys where there are both PubKeyHash and PubKey outputs for the same key).

The size of the snapshot will depend heavily on the min claim value used. If we assume the average record is 24 bytes (20 byte identifier and 4 byte for value) that would make a full blockchain snapshot ~72MB.  It would be up to the spinoff developer but I believe to reduce size a min claim of 10, 100 or 1,000 bits is defendable.   That would result in a snapshot of 54 MB (25% reduction), 48 MB (33% reduction), and 38 MB (47% reduction) respectively.  Even with a minimum of 1,000 bits the included claims represents > 99.99% of the spendable BTC in the blockchain.

DeathAndTaxes
Donator
Legendary
*
Offline Offline

Activity: 1218
Merit: 1005


Gerald Davis


View Profile
July 03, 2014, 11:00:14 PM
 #337

One last update before the weekend.  Compared to the one above I hashed all PubKeys to their PubKeyHashes.  This has the effect of saving 13 bytes per record and it reduces the total number of records by another ~1,600.

Code:
     Compact Distribution     Num Claims             Claim Value
---------------------------------------------------------------------
             Valid Claims      2,981,667      12,879,829.77705084 BTC
           Invalid Claims         16,744           2,610.01397770 BTC
             Total Claims      2,998,411      12,882,439.79102854 BTC

Code:
       Valid Distribution     Num Claims             Claim Value              Pct of Valid        Pct of Total
--------------------------------------------------------------------------------------------------------------
               PubKeyHash      2,962,366      12,862,099.17235302 BTC            99.8623 %           99.8421 %
               ScriptHash         14,252          17,719.13794714 BTC             0.1376 %            0.1375 %
           NativeMultiSig          5,033              10.44409149 BTC             0.0001 %            0.0001 %
                RawScript             16               1.02265919 BTC             0.0000 %            0.0000 %
               
     Invalid Distribution     Num Claims             Claim Value              Pct of Valid        Pct of Total
--------------------------------------------------------------------------------------------------------------
       UnspendableOpFalse              1           2,609.36304319 BTC             0.0203 %            0.0203 %
     UnspendableP2PoolBug              1               0.60280235 BTC             0.0000 %            0.0000 %
 UnspendableInvalidOpcode             20               0.04530933 BTC             0.0000 %            0.0000 %
 UnspendableInvalidPubKey         16,696               0.00242283 BTC             0.0000 %            0.0000 %
     UnspendablePushError              1               0.00040000 BTC             0.0000 %            0.0000 %
     UnspendableZeroValue             25               0.00000000 BTC             0.0000 %            0.0000 %

Code:
           Minimum Claim     Num Claims             Claim Value               Pct of Valid        Pct of Total
--------------------------------------------------------------------------------------------------------------
                0.01 bits      2,981,667      12,879,829.77705084 BTC           100.0000 %           99.9797 %
                0.10 bits      2,510,787      12,879,829.76979222 BTC           100.0000 %           99.9797 %
                1.00 bits      2,295,098      12,879,829.70929688 BTC           100.0000 %           99.9797 %
                5.43 bits      2,261,743      12,879,829.61522343 BTC           100.0000 %           99.9797 %
               10.00 bits      2,245,744      12,879,829.48921353 BTC           100.0000 %           99.9797 %
               54.30 bits      2,152,928      12,879,826.75869942 BTC           100.0000 %           99.9797 %
              100.00 bits      2,010,915      12,879,817.03184494 BTC            99.9999 %           99.9796 %
            1,000.00 bits      1,565,261      12,879,651.11951631 BTC            99.9986 %           99.9784 %
Zangelbert Bingledack
Legendary
*
Offline Offline

Activity: 1036
Merit: 1000


View Profile
July 04, 2014, 05:13:21 PM
Last edit: July 06, 2014, 02:10:35 AM by Zangelbert Bingledack
 #338

I'd like to refine the notion in the OP of "most efficient coin distribution," why Bitcoin has something vastly closer to it and why other coins can't hope to catch up anytime soon.

I think the answer requires a clear understanding of the crucial role investors play in society, so here I will attempt to explain that for those who've never considered it:

Simply put, a good investor can predict the future better than other people can. If, for example, you're a good investor and you predict a water shortage in a few years you can buy water rights now and have a good chance of selling them later for a profit. However, as a side effect, you bid up the price of water for everyone else, which incentivizes people to conserve water, incentivizes water producers to invest more into water production, motivates new companies to get into the water business, prompts more people to major in related fields in college, etc. The water shortage you correctly predicted will be mitigated by these actions. In this way, "greedy speculators" can even save lives. Shocked

As a good investor, you would likely sell when the water production infrastructure has become too big and society would benefit more from some of the resources/manpower being diverted elsewhere, and by pushing down the price you create exactly the right incentive structure for that. Being as most people aren't good at predicting the future and you are, you help society manage resources more efficiently than it would without you.

Not only do you help society manage resources more efficiently, even to the point of saving lives, but the better you are at that the more power you will gain to do it on a larger and larger scale since you keep getting wealthier and have more power to move markets. Think of how many lives Jim Rogers has probably saved in Africa by being such a successful and large-scale commodities investor! Conversely, if you're bad at predicting the future like most people are, you will tend to lose money and damage society - in the case of water by aggravating the shortages and bankrolling the oversupply gluts - but your ability to affect society will diminish as you lose your wealth and thereby effectively lose the "right" to divert resources in society.

In this way, the freer the market, the more it optimizes toward moving the ability to direct resources into the most capable hands: the most highly skilled investors (future prediction specialists) and the best entrepreneurs - and these people are exactly the people we want getting rich. Without them, we'd all be a lot less rich ourselves, or maybe even dead or never born because the resources for our parents to raise us weren't sufficient due to systemic mismanagement. Notice this is not centralized government management; every argument for "why we need central planners" is answered in the concepts touched on above: there is a huge distributed network of specialists at predicting the future and putting resources to the most efficient known uses. These are your planners, bringing all the benefits of having experts plan stuff, but not central planners in government offices.

Over time (not immediately), this process vests greater control over the societal ledger with those people who benefit society the most with their skills, and of course these skilled people are also motivated to do this because it directly benefits them by making them rich.  

Note that even short-term speculators and day traders - if they are good - also play a valuable role: they dampen short-term and even intraday volatility by buying the dips and selling the peaks. It's the unskilled speculators that create the volatility, but those people tend to lose their shirts pretty fast. That's a major reason why there's less volatility as a market matures, with the unskilled and therefore volatility-increasing speculators getting smaller and smaller while the skilled and therefore volatility-reducing speculators get bigger and bigger, thereby making the market ever smoother. (Note, however, that even though the Bitcoin market is maturing, it is not getting that much less volatile, simply because the market expands exponentially, so that the immature portion of it is always bigger than the older mature portion.)

Now Bitcoin investment is a little more abstract than water investments. We cannot say that investing in Bitcoin before it got scarce helped society avert a water shortage. However, it incentivized miners to get in and secure the network to a level commensurate with the market cap, it motivated people like Tony Gallippi to divert his talents and time toward starting BitPay, and it generally created the incentive structure to support many other similar infrastructural buildouts. Of course it also attracted media and other much-needed public attention, as well as that of academics and of course investors.

It is obvious from reading the speculation forum (and reddit) that only those with the conviction to hodl actually hodl, and those strongly tend to be people who both understand Bitcoin at a deep level (aren't spooked by FUD) and understand the necessary aspects of investing (aren't spooked by volatility because they understand the dynamics of the binary bet and exponential growth). When bitcoin went from under a dollar to over $30, then down to $2 over the course of a single year, the market was shaking out the weak hands - the poor speculators - in a spectacularly violent fashion. But what is this volatility optimizing for - that is, who's this market making rich? Here are some obvious ones:

1) Good hodlers, people who deeply grok Bitcoin and its investment aspects, who will likely dishoard in an organized fashion at peaks, dampening the bubbles and not contributing to the panic knifedowns nor to the FUD-based selloffs

2) Good traders who see the big picture, thereby decreasing the volatility

3) Perhaps warranting their own category are the people who have extremely deep or highly technical understanding of Bitcoin, to the level of what would almost seem to others like "insider information" (though there's nothing bad about this; in fact it's a good thing, if you follow the reasoning outlined above)

The main point is, besides some people who got lucky, the Bitcoin ledger is held mainly by people who most strongly believe in and most intimately understand Bitcoin (and understood it early!) and the relevant investing principles, or those who have actually contributed materially to the ecosystem. These people are far more likely than others to know where resources should be diverted to in the economy for maximum benefit to the community, and the maturity of the Bitcoin ledger ensures that they actually have the ability to command such resources (i.e., they have accumulated a lot of bitcoins to invest in infrastructure, and they have relatively little competition from unskilled investors who fund dead-end projects that waste dev time, etc.).

TL;DR: While the debate over whether the Bitcoin ledger is more or less "fair" than some other distribution may never be settled to everyone's satisfaction, we can clearly see that it is in a very useful sense the most efficient distribution for the general benefit of the community, thanks to the eminently helpful role that skilled investors play in society and how the extreme volatility over the years has continually shaken out the unskilled investors who would waste the community's resources.

If we imagine even a new amazing coin that is magically perfectly distributed to everyone, 100 coins for every person in the world, we can easily see that the initial investment decisions will be terribly skewed since all sorts of unskilled people will be throwing their coins at things and diverting resources and labor into useless, go-nowhere projects. Also, volatility would be way worse, and panic sell-offs due to FUD/misunderstandings would be horrendous. A ledger needs to grow organically through a market process, or else it will face challenges that it would only be ready to face some years later when the market is more solidified in strong hands by the above process.

By the way, are there any other things the Bitcoin ledger has optimized for (any other type of person the history of Bitcoin price movements has made especially rich (or especially shaken out))?
Peter R
Legendary
*
Offline Offline

Activity: 1162
Merit: 1007



View Profile
July 07, 2014, 06:12:08 AM
Last edit: July 07, 2014, 07:22:50 AM by Peter R
 #339

Gerald, you've really done a nice job parsing the UTXO.  Am I reading your results correctly that 99.9999% of the consolidated claims can be cast as either P2SH or P2PkH?  And that with a 72 MB snapshot file, we can retain ALL of the information related to the wealth distribution?  (Developers can then choose which dust limits they apply and whether to support complex scripts, reducing the size of the snapshot file even further).  

For those that might be interested, here is a bar chart of DeathAndTaxes' results.  You can clearly see that the vast majority of the claims represent only a small portion of the total bitcoin wealth; and conversely, a small portion of the claims represents the vast majority of the wealth.  Also note the "dent" in the distribution due to the dust limit.  If the dust-limit had been in effect since 2009, we'd have a much cleaner UTXO set.  (I expect those ~million small claims to the left of the yellow band were probably the result of people experimenting on the main network rather than the testnet).


Run Bitcoin Unlimited (www.bitcoinunlimited.info)
DeathAndTaxes
Donator
Legendary
*
Offline Offline

Activity: 1218
Merit: 1005


Gerald Davis


View Profile
July 07, 2014, 04:41:09 PM
Last edit: July 07, 2014, 05:37:10 PM by DeathAndTaxes
 #340

Gerald, you've really done a nice job parsing the UTXO.  Am I reading your results correctly that 99.9999% of the consolidated claims can be cast as either P2SH or P2PkH?  And that with a 72 MB snapshot file, we can retain ALL of the information related to the wealth distribution?  (Developers can then choose which dust limits they apply and whether to support complex scripts, reducing the size of the snapshot file even further).  

Correct.  It would be roughly 70 MB depending on the specific format used. The raw script and native multisig claims are a rounding error and won't significantly affect the size of the snapshot. There are ~3 million claims so the size will depend on the size of the P2SH and P2PkH claim records.  At a minimum this would be 20 bytes for the identifier plus 2 to 3 bytes avg (varint) to represent the value.  At most it would be 29 bytes per record (1 byte for type, 20 byte identifier, 8 bytes for value).  

One final optimization option would be to refactor all the native multisig outputs to their P2SH equivalents.  Users could still make claims in a trust free manner but the original scripts should probably be retained in a separate file (probably along with all other refactored outputs).  The refactor file wouldn't be needed by clients but it would be useful for web services which assist users in identifying if they have a claim and what steps they need to take to redeem that claim.

For the record here are the only "valid" unspent outputs which are still raw scripts.  This just means I haven't yet been able to provably show them as unspendable or refactor them to either a Pay2PubKeyHash or Native-Multisig claims.  It probably is possible to reduce this list of exceptions further.

Code:
TxId:Index                                                            Type                   Size          Value
----------------------------------------------------------------------------------------------------------------
0ea6987a7dd49887bea87ae2106e4ef99d29799047a8fb10906be93ba0cbc719:0    RawScript               173         100000
66747648ef92d78bb36c267c0f444e6df96e44f463a2f92541483fcb8e882b27:1    RawScript                54          10000
9837a637931f74df1cb52b1045e479e4d7065f72db4d449d732211eb0e5cfd4c:0    RawScript               180          55000
9837a637931f74df1cb52b1045e479e4d7065f72db4d449d732211eb0e5cfd4c:1    RawScript               180          55000
af32bb06f12f2ae5fdb7face7cd272be67c923e86b7a66a76ded02d954c2f94d:0    RawScript                35      100000000
c4b46c5d88327d7af6254820562327c5f11b6ee5449da04b7cfd3710b48b6f55:0    RawScript                35          20000
1890467d5d65519a586ccd16e9e3a893a803467cd28941591e72bc99dec9ac5d:0    RawScript               174         100000
9f17f3ce43019c24baa6d679edfdddeada856f617cd9c1f6008d49be4542b768:0    RawScript               364        1100000
cd2dacbd05389580cb569985b3a8b1db67ea6cc84371223590e241a5026d0a8a:0    RawScript               209          15000
09857a1abf60f04971af09b36be4fb0d0a6143bc0af9c4cfcebf99ed6f97df9a:1    RawScript                60          10000
fb01987b540ec286973aac248fab643de82813af452d958056fee8de9f4535ab:0    RawScript                35          20000
daae4951c39c0198166a1d9e2eef542f17412883710567a22ad8dccbe2476ab9:0    RawScript               180          60000
daae4951c39c0198166a1d9e2eef542f17412883710567a22ad8dccbe2476ab9:1    RawScript               180          60000
702c36851ed202495c2bec1dd0cefb448b50fafd3a5cdd5058c18ca53fc2c3d1:0    RawScript                35          20000
faf8989ed87c5a667a1ead813aea718727e01767c124193297eaf409ff4645e5:1    RawScript                35         500000
75bb6417afc7500a6389201a67bfc2428a1241170a214bbf6833a389191036fe:0    RawScript               182          55000
75bb6417afc7500a6389201a67bfc2428a1241170a214bbf6833a389191036fe:1    RawScript               182          55000

To refactor valid but not non-standard outputs and to remove invalid outputs I first reduced the raw scripts in the UTXO to a set of templates.  Then updated the parser to handle these templates.  The remaining raw script outputs are one of the following templates which the parser handles as "unknown" and keeps as a raw script.

Code:
OP_HASH256 <data:32> OP_EQUAL
OP_DUP OP_HASH256 <data:32> OP_EQUAL OP_NOTIF OP_HASH256 <data:32> OP_EQUAL OP_ENDIF <data:x> OP_DROP
OP_SHA256 <data:32> OP_EQUAL OP_SWAP <pubkey:32|65> OP_CHECKSIG OP_BOOLOR OP_SWAP <pubkey:33|65> OP_CHECKSIG OP_AND OP_VERIF
OP_SHA256 <data:32> OP_EQUAL OP_SWAP <pubkey:33|65> OP_CHECKSIG OP_BOOLOR OP_SWAP <pubkey:33|65> OP_CHECKSIG OP_BOOLAND
OP_SIZE <data:1> <data:1> OP_WITHIN OP_SWAP OP_SHA256 <data:32> OP_EQUAL OP_BOOLAND OP_SWAP <pubkey:33|65> OP_CHECKSIGVERIFY OP_SWAP <pubkey:33|65> OP_CHECKSIG OP_BOOLOR
OP_DUP OP_SIZE <data:1> <data:1> OP_WITHIN OP_SWAP OP_SHA256 <data:32> OP_EQUAL OP_BOOLAND OP_IF OP_DROP OP_ELSE <pubkey:33|65> OP_CHECKSIGVERIFY OP_ENDIF <pubkey:33|65> OP_CHECKSIG
OP_IF 2 <pubkey:33|65> <pubkey:33|65> 2 OP_CHECKMULTISIGVERIFY OP_ELSE 2 <pubkey:33|65> <pubkey:33|65> 2 OP_CHECKMULTISIGVERIFY OP_ENDIF
OP_DUP OP_HASH160 <data:20> OP_EQUALVERIFY OP_SWAP OP_DUP OP_HASH160 <data:20> OP_EQUALVERIFY 2 OP_ROT OP_ROT 2 OP_CHECKMULTISIG
OP_HASH256 <data:32> OP_EQUALVERIFY OP_DUP OP_HASH160 <data:20> OP_EQUALVERIFY OP_CHECKSIG




Some of these outputs are "insecure" in that once the true owner tries to spend them (or sign a claim), anyone else could modify the output and steal the coins and/or claim.  One example is  af32bb06f12f2ae5fdb7face7cd272be67c923e86b7a66a76ded02d954c2f94d:0.  The output is encumbered only by a hash (no signature required).  The ScriptSig needed to produce a valid txn is thus just some value which when hashed produces the hash value in the PkScript (output script).  The problem is that when the owner attempts to spend those outputs any other node (including a miner) could modify the txn substituting their own output instead.  Since the required input is just a value with no signature there is no way to prevent this (other than trusting a miner in secret).  I am not exactly sure how they should be handled but I am leaning towards truncating them for security reasons.

Quote
Also note the "dent" in the distribution due to the dust limit.  If the dust-limit had been in effect since 2009, we'd have a much cleaner UTXO set.  (I expect those ~million small claims to the left of the yellow band were probably the result of people experimenting on the main network rather than the testnet).

A large portion of the sub dust txns (especially the 1 satoshi variety) are spam from Satoshi's Dice.  The service used a 1 satoshi (effectively unspendable) output to notify betters of a failed bet.  They were a significant reason for the introduction of the dust output rules.  At one time the 1 satoshi outputs represented almost half of the UTXO.   Actual spendable outputs tend to grow rather slowly so without the dust rules it is entirely possible at some point the overwhelming majority of the UTXO would have been dust spam.  Despite the controversy at the time the dust rules are both logical (what purpose is there in creating outputs which are marked as spendable, stored by all nodes forever, and yet are uneconomical to spend) and necessary to prevent the UTXO from becoming utterly unmanageable.
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 [17] 18 19 20 21 22 23 24 25 26 »
  Print  
 
Jump to:  

Bitcointalk.org is not available or authorized for sale. Do not believe any fake listings.
Sponsored by , a Bitcoin-accepting VPN.
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!