Title: [RFC] c32d encoding
Post by: Luke-Jr on May 15, 2013, 06:32:24 AM
This encoding is designed so that it could replace Base58Check in new data, with the following goals in mind: - Impossible(?) to manipulate without completely changing it
- Clearly identifiable prefix, regardless of data size
- Cheaper to process (simpler and faster code; it's a power-of-two radix)
- Fixed length string for fixed length data
- More unambiguous (removal of chars 'isuvzSVZ')
- Compatible with using seven-segment displays
- Altcoin friendly (16 bit namespace, which can be read without decoding)
Since there are fewer digits and more identifying/signature characters, addresses are longer. This should be less of a problem since ordinary users will hopefully be using addresses less common as the payment protocol becomes more popular. Example Python code (including tests) is at the bottom. I can write up a formal BIP if this seems useful. For example: 160 bits of data, such as current addresses: - 2nc111dhAPE2aUdYAOF88JhLn5jEjbULy4eFe9tyFYFE8
An ordinary P2SH destination, incorporating Greg's "require the hash mid-image to be relayed" concept (256 bits of data): - 2bc511A95e74P13dPb6b5t7yrh12EhC363ayH98n1cFbr3rAHdA49nCcC1G3P71j
The same key in Namecoin: - 2nc5119ttL35HPhc3Hh6aHe2tOhF6rdFtAOE1ahFLt9Ecabhcn5FLea5Le71P56C
The example "puzzle" script from the wiki (arbitrary scripting): - 2bc311d126acCyAnHAjabeUtOHcr7F811j4UYE6ECtOcbcGGn4O9chAt7O7y2LU9ty9cnG4
An alternative for BIP32 extended public keys (560 bits): - 2bc911AcchHheAGFnn9LC6FdF7bOc99APJtcEc46U655JheH6LCr3Y333eFEOtPJ9rj22rEcchHheAG Fnn9LC6FdF7bOc99APJtcEc46U655JheH6LCr3YJCtPYea
An alternative for BIP32 extended private keys (552 bits): - 2bcb11O77GHdP53FH7Jh44OdEh3rLd4eFr2h7c8rGeErELG18yCy9O7L9LednyHJa5hyeAP77GHdP53 FH7Jh44OdEh3rLd4eFr2h7c8rGeErELG18yCyGG5drPF1
c32.py # Digits are chosen to be the least ambiguous # They all have unique seven-segment glyphs, and cannot be easily confused by humans digits = '123456789abcdehjnrtyACEFGHJLOPUY' radix = len(digits)
def encode(v): if not len(v): return '' n = 0 bits = 0 o = [] pad = (len(v) * 8) % 5 if pad: v = b'\0' + v v = bytearray(v) # For Python 2.7 compatibility for i in range(len(v) - 1, -1, -1): n |= v[i] << bits bits += 8 while bits >= 5: o.insert(0, digits[n & 0x1f]) n >>= 5 bits -= 5 if i == 0 and pad: break return ''.join(o)
def decode(s): n = 0 bits = 0 o = bytearray() for i in range(len(s) - 1, -1, -1): n |= digits.index(s[i]) << bits bits += 5 while bits >= 8: o.insert(0, n & 0xff) n >>= 8 bits -= 8 return bytes(o)
def test(): from math import ceil assert '' == encode(b'') for (i, oc) in ( (1, '8'), (2, '2'), (3, 'j'), (4, '4'), (5, 'Y'), (6, '8'), (7, '2'), (8, 'j'), (9, '4'), ): ol = int(ceil(i * 8 / 5.)) try: inzero = b'\0' * i inone = b'\xff' * i ezero = encode(inzero) eone = encode(inone) dzero = decode(ezero) done = decode(eone) assert ezero == '1' * ol assert eone == oc + ('Y' * (ol - 1)) assert dzero == inzero assert done == inone except AssertionError: raise AssertionError('Input of length %s failed test' % (i,)) try: for c in range(1024): decode('111' + chr(c)) except ValueError: pass else: raise AssertionError('Invalid decode input (%02x) did not throw a ValueError' % (c,))
if __name__ == '__main__': test() print("Tests passed")
c32d.py import c32 import hashlib import struct
def _checksum(v): return hashlib.sha256(hashlib.sha256(v).digest()).digest()[-4:]
''' String format: - c32(Raw format) in reverse order
Raw format: - 4 bytes checksum - N bytes data (NOTE: encoded to prevent hidden changes) - - for script: - - - N bytes: varint preimage data length - - - N bytes: preimage data - - - N bytes: script data - - for BIP32 HD parent key: - - - 32 bytes: chain code - - - 33 bytes: parent pubkey - - for BIP32 serialized key: - - - 1 byte: depth - - - 4 bytes: child number - - - 32 bytes: chain code - - - One of: - - - - 32 bytes: private key data - - - - 33 bytes: public key data - 1 byte flag (ignored if unknown) - 1 byte type - - 01 script (with preimage data) - - 02 script hash preimage - - 03 BIP32 HD parent key - - 04 BIP32 serialized public key - 2 bytes namespace (blockchain id) - - 2d41 Bitcoin ('2bc') - - 2e01 Namecoin ('2nc') - - 2e37 Freicoin ('FRC') '''
class c32d: __slots__ = ('data', 'ns', 'dtype', 'dflag') def __init__(self, data, ns, dtype, dflag): self.data = data self.ns = ns self.dtype = dtype self.dflag = dflag @classmethod def decode(cls, s, raw = False): if not raw: full = c32.decode(s[::-1]) else: full = s csum = bytearray(full[:4]) v = bytearray(full[4:]) # Encode the configuration bytes to simplify decoding pv = 0xbf for i in range(len(v) - 1, len(v) - 5, -1): pv = v[i] ^ (csum[i % 4]) ^ pv v[i] = pv v.append(0xbf) for i in range(len(v) - 1): v[i] ^= csum[i % 4] ^ v[i + 1] v.pop() v = bytes(v) if csum != _checksum(v): raise ValueError('c32d checksum wrong') o = cls(None, None, None, None) o.data = v[:-4] o.dflag = v[-4] o.dtype = v[-3] o.ns = struct.unpack('!H', v[-2:])[0] return o def encode(self, raw = False): try: v = self.data + struct.pack('!BBH', self.dflag, self.dtype, self.ns) except struct.error as e: raise ValueError(e) csum = bytearray(_checksum(v)) v = bytearray(v) pv = 0xbf for i in range(len(v) - 1, -1, -1): pv = v[i] ^ csum[i % 4] ^ pv if i < len(v) - 4: v[i] = pv v = csum + bytes(v) if raw: return v return c32.encode(v)[::-1]
decode = c32d.decode
def encode(*a, **ka): return c32d(*a, **ka).encode()
def test(): c32.test() for (p, s, raw) in ( ((b'', 0, 0, 0), '1111115Fd9acc', b'\xb5\xa5\x0c\xb9\x00\x00\x00\x00'), ((b'test', 4232, 142, 219), '955OGe8hOGc97hH4EJj1', b'?V\x1e\\d/\x1cq\xdb\x8e\x10\x88'), ((b'\xff' * 0x100, 0xffff, 0xff, 0xff), 'YYYYYYc327OYcC6F9Or6r14UYCJtc5UGt9n2YYbeH63jda5GnYjCEO3r8E53dGYFbchrG4c327OYcC6F9Or6r14UYCJtc5UGt9n2YYbeH63jda5GnYjCEO3r8E53dGYFbchrG4c327OYcC6F9Or6r14UYCJtc5UGt9n2YYbeH63jda5GnYjCEO3r8E53dGYFbchrG4c327OYcC6F9Or6r14UYCJtc5UGt9n2YYbeH63jda5GnYjCEO3r8E53dGYFbchrG4c327OYcC6F9Or6r14UYCJtc5UGt9n2YYbeH63jda5GnYjCEO3r8E53dGYFbchrG4c327OYcC6F9Or6r14UYCJtc5UGt9n2YYbeH63jda5GnYjCEO3r8E53dGYFbchrG4c327OYcC6F9Or6r14UYCJtc5UGb2cOdG3', b'\xb0\xce,*\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xc7\x88\xb9j\xbf\xf0\xc1\x12\xff\xff\xff\xff'), ): (kp, pp) = ({}, p) for i in range(2): o = c32d(*pp, **kp) assert o.data == p[0] assert o.ns == p[1] assert o.dtype == p[2] assert o.dflag == p[3] kp = { 'data': p[0], 'ns': p[1], 'dtype': p[2], 'dflag': p[3], } pp = () assert o.encode() == s assert o.encode(raw=True) == raw def ensureValueError(f): try: f() except ValueError: pass else: raise AssertionError('Invalid decode input did not throw a ValueError') ensureValueError(lambda: encode(b'', -1, 0, 0)) ensureValueError(lambda: encode(b'', 0x10000, 0, 0)) ensureValueError(lambda: encode(b'', 0, -1, 0)) ensureValueError(lambda: encode(b'', 0, 0x100, 0)) ensureValueError(lambda: encode(b'', 0, 0, -1)) ensureValueError(lambda: encode(b'', 0, 0, 0x100)) # Invalid c32d ensureValueError(lambda: decode('1111115Fd9adc')) ensureValueError(lambda: decode('11A1115Fd9acc')) # Invalid c32 ensureValueError(lambda: decode('111x115Fd9acc')) ensureValueError(lambda: decode('1111115Fd9acx'))
if __name__ == '__main__': test() print("Tests passed")
Title: Re: [RFC] c32d encoding
Post by: casascius on May 15, 2013, 02:45:25 PM
This would be my wish list for a new alternate address encoding:
1. Some sort of datestamp/expiry field so that users could mark their addresses so clients would throw up warnings if someone attempts to pay them at some point in the future beyond the point when the address is expected to be used. The field could be optional and missing to make the address valid forever, and could have variable resolution (e.g. a time portion only if it's truly sensitive with resolution greater than 1 day). One could still pay an expired address, they would just have to wade through a lot of protest from their client and perhaps have it in an advanced mode.
2. Checksum/error correction based on Reed-Solomon instead of SHA256, so that single-position errors, up to a limited number, have a 100% probability of being correctable and a 0% probability of being corrected to the wrong thing.
3. Of course, compatibility with payment protocols.
Meanwhile, I would submit the following:
4. In practice, the ambiguity between symbols such as "isuvzSVZ" is, in my estimation, a low concern, given the widespread emphasis on copying, pasting, or scanning bitcoin addresses and not typing them in. While I have done it myself many times with the unusual way I do bitcoins, I have never seen a single potential payee say "Here, hand-type this Bitcoin address into your Bitcoin client." Further, with the proper Reed-Solomon error correction, visually similar substitutions could be routinely reliably corrected. (That said though, there's also a value in a power-of-two radix, and if throwing away some symbols helps us achieve that, well then... this is a great idea)
5. The prefix (in this case "2") should be useful to the user to help them identify what kind of object type they are looking at. In terms of usability, it is a really costly design decision to use the same prefix for an address versus a private key and to rely on the user to distinguish the two using a software tool or by somehow memorizing the length. Having the same prefix with distinction by length etc. would be a great idea for an altcoin used only by a niche community with specialized skills (of the sort who would learn Klingon etc.), but for a coin meant the public at large, a public who will suffer from confusion and errors as a result of this, this really needs to not be forgotten.
|