Question about base58 encoding

Peter R (OP)

Legendary

Offline

Activity: 1162
Merit: 1007

Question about base58 encoding

May 17, 2014, 07:43:57 PM

#1

I'm slowly working my way through the technical details of bitcoin by writing custom Mathematica code (just for learning purpose). Last night I wrote my own code to convert an ECDSA public key into a bitcoin address. I (think I) understand everything except for the leading "1"s in the bitcoin address. Here's my Mathematica function for computing the bitcoin address from a public key :

Code:

bitcoinaddress[publicKey_Integer] := Module[{h1, h2, addr},
   h1 = ripemd160[sha256[publicKey, 65], 32];
   h2 = sha256[sha256[h1, 21], 32];
   addr = 2^32 h1 + Quotient[h2, 2^224];
   Return[ToBase58[addr]]];

(*Note that the second argument in the hash functions is the number of message bytes to assume since Mathematica is using arbitrary-precision integers*)

And here's the flawed code I'm using for ToBase58[]. The flaw is at the end when I prepend the "1" in order to get the "right answer," but I realize this is a hack and will not always work if there are leading zeros in the pub-key hash:

Code:

ToBase58[y_Integer] := 
 Module[{codestring = "123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz", outputString = "", x = y},
 
    While[x > 0, outputString = outputString <> StringTake[codestring, {Mod[x, 58] + 1}]; x = Quotient[x, 58]];

    Return["1" <> StringReverse[outputString]]];  (*prepending the "1" like this is wrong*)

On the bitcoin wiki, it says:

What I don't understand are these things in red:

address_byte_string (consisting of 0x01 + hash + 4-byte_check_code)

If I prepend 0x01 prior to converting to base58 I get the wrong address, whereas if I ignore this byte I get the correct address.

repeat(number_of_leading_zero_bytes_in_hash)

^^This seems strange and arbitrary to me. I add an extra "1" for every leading byte that is 0?

NOTE TO READERS: Do not use this code for anything important. Use a real bitcoin library.

Run Bitcoin Unlimited (www.bitcoinunlimited.info)

piotr_n

Legendary

Offline

Activity: 2053
Merit: 1354

aka tonikt

Re: Question about base58 encoding

May 17, 2014, 08:54:47 PM
Last edit: May 17, 2014, 09:33:12 PM by piotr_n

#2

You add 0x01 at input while you should be adding '1' at the output

Yeah, in bitcoin's base58 encoder there is a weird thing.
At the end of all this math, for every byte of value 0 from the beginning of the input data, you add one character 1 to the output string.
So if there was one zero byte, you add one character. If there were two, you add two and so on...
That's probably what you're missing.
It's actually not that much weird if you consider that this extra 1s allow to preserve the byte length of the original data, though when you know that the addresses version 0 have fixed length and 4 bytes checksum, then you also figure that it's in fact pretty useless.

edit:
The wiki is a bit misleading with the 0x01, since most addresses have 0x00, but it's still all true what it says.

edit2:
ftfy: https://en.bitcoin.it/wiki/Base58Check_encoding#Base58_symbol_chart

Check out gocoin - my original project of full bitcoin node & cold wallet written in Go.
PGP fingerprint: AB9E A551 E262 A87A 13BB 9059 1BE7 B545 CDF3 FD0E

Peter R (OP)

Legendary

Offline

Activity: 1162
Merit: 1007

Re: Question about base58 encoding

May 17, 2014, 09:39:56 PM

#3

Quote from: piotr_n on May 17, 2014, 08:54:47 PM

You add 0x01 at input while you should be adding '1' at the output

Yeah, in bitcoin's base58 encoder there is a weird thing.
At the end of all this math, for every byte of value 0 from the beginning of the input data, you add one character 1 to the output string.
So if there was one zero byte, you add one character. If there were two, you add two and so on...
That's probably what you're missing.
It's actually not that much weird if you consider that this extra 1s allow to preserve the byte length of the original data, though when you know that the addresses version 0 have fixed length and 4 bytes checksum, then you also figure that it's in fact pretty useless.

edit:
The wiki is a bit misleading with the 0x01, since most addresses have 0x00, but it's still all true what it says.

edit2:
ftfy: https://en.bitcoin.it/wiki/Base58Check_encoding#Base58_symbol_chart

Thanks piotr! If the 0x01 is not really supposed to be there, then it all makes sense to me. I still find it odd to use a base58 digit to represent a byte, but I guess that's probably actually the most reasonable way to have done it.

Run Bitcoin Unlimited (www.bitcoinunlimited.info)

piotr_n

Legendary

Offline

Activity: 2053
Merit: 1354

aka tonikt

Re: Question about base58 encoding

May 17, 2014, 09:45:29 PM

#4

It's the way it's done, but whether it was the most reasonable way to do it, I'd have argued about it.

Anyway, glad I could help.
And reading the wiki remember that it's not the ultimate spec, but we may fix it, as we discover fragments to improve, as you just did.

Check out gocoin - my original project of full bitcoin node & cold wallet written in Go.
PGP fingerprint: AB9E A551 E262 A87A 13BB 9059 1BE7 B545 CDF3 FD0E

DeathAndTaxes

Donator
Legendary

Offline

Activity: 1218
Merit: 1079

Gerald Davis

Re: Question about base58 encoding

May 17, 2014, 10:06:16 PM
Last edit: May 18, 2014, 08:30:22 AM by DeathAndTaxes

#5

Quote from: piotr_n on May 17, 2014, 09:45:29 PM

It's the way it's done, but whether it was the most reasonable way to do it, I'd have argued about it.

Exactly. It is another "Satoshism". It makes things more complex for no obvious benefit/reason. Each one by itself isn't that hard to workthrough but collectively they add up to reduce the transparency of the code.

This does mean that some Bitcoin addresses are shorter however each digit shorter is progressively less likely. Most (99.7%) addresses are 34 or 35 digits long. I can't see that being a good reason for all the complexity. It actually makes it slightly more confusing because you can't say "All Bitcoin addresses are X digits long", you can't even say 34 or 35 digits because in theory they can be as short as 20 digits (version 0x0 + payload/hash of all zeroes + checksum). No idea why Satoshi didn't just do something like prepend the version byte (where valid version byte is any value other than 0x00) and then convert to base58. No need for any weird leading 0x00 = "1" check. By using a version byte of 0x01 or greater the leading zeroes of the hash would be preserved.

It isn't ever going to change now but it does make you scratch your head and it makes it harder to understand the code because there is no obvious benefit (which helps when trying to figure out what a chunk of code is doing).

gmaxwell

Moderator
Legendary

Offline

Activity: 4158
Merit: 8382

Re: Question about base58 encoding

May 18, 2014, 07:41:51 AM

#6

FWIW, I think the current bignum free code we have in the reference client, now is moderately more clear than whats in the wiki. https://github.com/bitcoin/bitcoin/blob/master/src/base58.cpp#L66