Bitcoin Forum
December 16, 2018, 02:21:57 PM *
News: Latest Bitcoin Core release: 0.17.0 [Torrent].
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: Question about base58 encoding  (Read 1027 times)
Peter R
Legendary
*
Offline Offline

Activity: 1162
Merit: 1007



View Profile
May 17, 2014, 07:43:57 PM
 #1

I'm slowly working my way through the technical details of bitcoin by writing custom Mathematica code (just for learning purpose).  Last night I wrote my own code to convert an ECDSA public key into a bitcoin address.  I (think I) understand everything except for the leading "1"s in the bitcoin address.  Here's my Mathematica function for computing the bitcoin address from a public key :

Code:
bitcoinaddress[publicKey_Integer] := Module[{h1, h2, addr},
   h1 = ripemd160[sha256[publicKey, 65], 32];
   h2 = sha256[sha256[h1, 21], 32];
   addr = 2^32 h1 + Quotient[h2, 2^224];
   Return[ToBase58[addr]]];

(*Note that the second argument in the hash functions is the number of message bytes to assume since Mathematica is using arbitrary-precision integers*)

And here's the flawed code I'm using for ToBase58[].  The flaw is at the end when I prepend the "1" in order to get the "right answer," but I realize this is a hack and will not always work if there are leading zeros in the pub-key hash:

Code:
ToBase58[y_Integer] :=
 Module[{codestring = "123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz", outputString = "", x = y},
 
    While[x > 0, outputString = outputString <> StringTake[codestring, {Mod[x, 58] + 1}]; x = Quotient[x, 58]];

    Return["1" <> StringReverse[outputString]]];  (*prepending the "1" like this is wrong*)



On the bitcoin wiki, it says:




What I don't understand are these things in red:

   address_byte_string (consisting of 0x01 + hash + 4-byte_check_code)

If I prepend 0x01 prior to converting to base58 I get the wrong address, whereas if I ignore this byte I get the correct address.

   repeat(number_of_leading_zero_bytes_in_hash)

^^This seems strange and arbitrary to me.  I add an extra "1" for every leading byte that is 0?



NOTE TO READERS: Do not use this code for anything important.  Use a real bitcoin library.  

Run Bitcoin Unlimited (www.bitcoinunlimited.info)
1544970117
Hero Member
*
Offline Offline

Posts: 1544970117

View Profile Personal Message (Offline)

Ignore
1544970117
Reply with quote  #2

1544970117
Report to moderator
PLAY NOW
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.
piotr_n
Legendary
*
Offline Offline

Activity: 1960
Merit: 1039


aka tonikt


View Profile WWW
May 17, 2014, 08:54:47 PM
 #2

You add 0x01 at input while you should be adding '1' at the output

Yeah, in bitcoin's base58 encoder there is a weird thing.
At the end of all this math, for every byte of value 0 from the beginning of the input data, you add one character 1 to the output string.
So if there was one zero byte, you add one character. If there were two, you add two and so on...
That's probably what you're missing.
It's actually not that much weird if you consider that this extra 1s allow to preserve the byte length of the original data, though when you know that the addresses version 0 have fixed length and 4 bytes checksum, then you also figure that it's in fact pretty useless.

edit:
The wiki is a bit misleading with the 0x01, since most addresses have 0x00, but it's still all true what it says.

edit2:
ftfy: https://en.bitcoin.it/wiki/Base58Check_encoding#Base58_symbol_chart

Check out gocoin - my original project of full bitcoin node & cold wallet written in Go.
PGP fingerprint: AB9E A551 E262 A87A 13BB  9059 1BE7 B545 CDF3 FD0E
Peter R
Legendary
*
Offline Offline

Activity: 1162
Merit: 1007



View Profile
May 17, 2014, 09:39:56 PM
 #3

You add 0x01 at input while you should be adding '1' at the output

Yeah, in bitcoin's base58 encoder there is a weird thing.
At the end of all this math, for every byte of value 0 from the beginning of the input data, you add one character 1 to the output string.
So if there was one zero byte, you add one character. If there were two, you add two and so on...
That's probably what you're missing.
It's actually not that much weird if you consider that this extra 1s allow to preserve the byte length of the original data, though when you know that the addresses version 0 have fixed length and 4 bytes checksum, then you also figure that it's in fact pretty useless.

edit:
The wiki is a bit misleading with the 0x01, since most addresses have 0x00, but it's still all true what it says.

edit2:
ftfy: https://en.bitcoin.it/wiki/Base58Check_encoding#Base58_symbol_chart

Thanks piotr!  If the 0x01 is not really supposed to be there, then it all makes sense to me.  I still find it odd to use a base58 digit to represent a byte, but I guess that's probably actually the most reasonable way to have done it.  

Run Bitcoin Unlimited (www.bitcoinunlimited.info)
piotr_n
Legendary
*
Offline Offline

Activity: 1960
Merit: 1039


aka tonikt


View Profile WWW
May 17, 2014, 09:45:29 PM
 #4

It's the way it's done, but whether it was the most reasonable way to do it, I'd have argued about it. Smiley

Anyway, glad I could help.
And reading the wiki remember that it's not the ultimate spec, but we may fix it, as we discover fragments to improve, as you just did.

Check out gocoin - my original project of full bitcoin node & cold wallet written in Go.
PGP fingerprint: AB9E A551 E262 A87A 13BB  9059 1BE7 B545 CDF3 FD0E
DeathAndTaxes
Donator
Legendary
*
Offline Offline

Activity: 1218
Merit: 1005


Gerald Davis


View Profile
May 17, 2014, 10:06:16 PM
 #5

It's the way it's done, but whether it was the most reasonable way to do it, I'd have argued about it. Smiley

Exactly.  It is another "Satoshism".  It makes things more complex for no obvious benefit/reason.  Each one by itself isn't that hard to workthrough but collectively they add up to reduce the transparency of the code.

This does mean that some Bitcoin addresses are shorter however each digit shorter is progressively less likely.  Most (99.7%) addresses are 34 or 35 digits long.   I can't see that being a good reason for all the complexity.  It actually makes it slightly more confusing because you can't say "All Bitcoin addresses are X digits long", you can't even say 34 or 35 digits because in theory they can be as short as 20 digits (version 0x0 + payload/hash of all zeroes + checksum).   No idea why Satoshi didn't just do something like prepend the version byte (where valid version byte is any value other than 0x00) and then convert to base58.  No need for any weird leading 0x00 = "1" check.  By using a version byte of 0x01 or greater the leading zeroes of the hash would be preserved.

It isn't ever going to change now but it does make you scratch your head and it makes it harder to understand the code because there is no obvious benefit (which helps when trying to figure out what a chunk of code is doing).
gmaxwell
Moderator
Legendary
*
qt
Offline Offline

Activity: 2590
Merit: 1754



View Profile
May 18, 2014, 07:41:51 AM
 #6

FWIW, I think the current bignum free code we have in the reference client, now is moderately more clear than whats in the wiki. https://github.com/bitcoin/bitcoin/blob/master/src/base58.cpp#L66

Bitcoin will not be compromised
Pages: [1]
  Print  
 
Jump to:  

Sponsored by , a Bitcoin-accepting VPN.
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!