Bitcoin Forum
November 18, 2017, 09:58:51 PM *
News: Latest stable version of Bitcoin Core: 0.15.1  [Torrent].
 
   Home   Help Search Donate Login Register  
Pages: [1]
  Print  
Author Topic: How much rarer are short bitcoin addresses?  (Read 2416 times)
RoxxR
Full Member
***
Offline Offline

Activity: 178


View Profile
November 23, 2013, 11:13:00 AM
 #1

I'm not very good at math... but I have noticed that 33-character addresses are about 20 times less common than 34-character addresses. Similarly, are 32-character addresses 20 times less common than 33-character addresses (and so on)?

I would really like to see a math guru confirm this.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.
1511042331
Hero Member
*
Offline Offline

Posts: 1511042331

View Profile Personal Message (Offline)

Ignore
1511042331
Reply with quote  #2

1511042331
Report to moderator
michagogo
Member
**
Offline Offline

Activity: 80


View Profile
November 23, 2013, 04:44:52 PM
 #2

http://bitcoin.stackexchange.com/questions/2013/why-does-the-length-of-a-bitcoin-key-vary
BurtW
Legendary
*
Offline Offline

Activity: 2086

All paid signature campaigns should be banned.


View Profile WWW
November 23, 2013, 04:56:09 PM
 #3


That does not answer the question about the distribution of Bitcoin address lengths.  I do remember seeing a post about that a long time ago.  Try the search function to see if you can find it.

Our family was terrorized by Homeland Security.  Read all about it here:  http://www.jmwagner.com/ and http://www.burtw.com/  Any donations to help us recover from the $300,000 in legal fees and forced donations to the Federal Asset Forfeiture slush fund are greatly appreciated!
cr1776
Legendary
*
Offline Offline

Activity: 1708


View Profile
November 24, 2013, 07:31:18 AM
 #4

They just vary due to encoding, so (somewhat simplified) if you had two:
12345678
00098765  could be written as 98765

That doesn't quantify it, but this explains:

Quote
Most Bitcoin addresses are 34 characters. They consist of random digits and uppercase and lowercase letters, with the exception that the uppercase letter "O", uppercase letter "I", lowercase letter "l", and the number "0" are never used to prevent visual ambiguity.
Some Bitcoin addresses can be shorter than 34 characters (as few as 27 in theory) and still be valid. A significant percentage of Bitcoin addresses are only 33 characters, and some addresses may be even shorter. Every Bitcoin address stands for a number - somewhat like an account number. These shorter addresses are valid simply because they stand for numbers that happen to start with zeroes, and when the zeroes are omitted, the encoded address gets shorter.
Several of the characters inside a Bitcoin address are used as a checksum so that typographical errors can be automatically found and rejected. The checksum also allows Bitcoin software to confirm that a 33-character (or shorter) address is in fact valid and isn't simply an address with a missing character.



I'm not very good at math... but I have noticed that 33-character addresses are about 20 times less common than 34-character addresses. Similarly, are 32-character addresses 20 times less common than 33-character addresses (and so on)?

I would really like to see a math guru confirm this.
piotr_n
Legendary
*
Offline Offline

Activity: 1778


aka tonikt


View Profile WWW
November 24, 2013, 09:56:58 AM
 #5

I'm not very good at math... but I have noticed that 33-character addresses are about 20 times less common than 34-character addresses. Similarly, are 32-character addresses 20 times less common than 33-character addresses (and so on)?

I would really like to see a math guru confirm this.

It's as you say, except that the factor is 58, not 20.
Each one character shorter address should be 58 times rarer...

Check out gocoin - my original project of full bitcoin node & cold wallet written in Go.
PGP fingerprint: AB9E A551 E262 A87A 13BB  9059 1BE7 B545 CDF3 FD0E
flatfly
Legendary
*
Offline Offline

Activity: 994


View Profile
November 24, 2013, 10:30:07 AM
 #6

I'm not very good at math... but I have noticed that 33-character addresses are about 20 times less common than 34-character addresses. Similarly, are 32-character addresses 20 times less common than 33-character addresses (and so on)?

I would really like to see a math guru confirm this.

It's as you say, except that the factor is 58, not 20.
Each one character shorter address should be 58 times rarer...

Nope, the factor is not 58. I haven't done the math (not enough time for this now) but from my quick initial testing, the factor seems to be roughly 22.5, at least for 33-character addresses. Don't forget to take leading zeros into account.

1111127SpvabYpoeDoiz5L7QPkfiSh2Q. Only donate if you have a reason to.
piotr_n
Legendary
*
Offline Offline

Activity: 1778


aka tonikt


View Profile WWW
November 24, 2013, 11:05:17 AM
 #7

I'm not very good at math... but I have noticed that 33-character addresses are about 20 times less common than 34-character addresses. Similarly, are 32-character addresses 20 times less common than 33-character addresses (and so on)?

I would really like to see a math guru confirm this.

It's as you say, except that the factor is 58, not 20.
Each one character shorter address should be 58 times rarer...

Nope, the factor is not 58. I haven't done the math (not enough time for this now) but from my quick initial testing, the factor seems to be roughly 22.5, at least for 33-character addresses. Don't forget to take leading zeros into account.

You're right - sorry, my mistake.
The first byte is always zero, which means that the first factor is lower.
13.140625, if I did the math right... (58*58/256)
But any further shorter ones should use the factor 58 / char.

Check out gocoin - my original project of full bitcoin node & cold wallet written in Go.
PGP fingerprint: AB9E A551 E262 A87A 13BB  9059 1BE7 B545 CDF3 FD0E
piotr_n
Legendary
*
Offline Offline

Activity: 1778


aka tonikt


View Profile WWW
November 24, 2013, 11:24:28 AM
 #8

I believe the proper calculation for the first factor comes from the value:

Code:
0x00FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF / 58 = 0x0469ee58469ee58469ee58469ee58469ee58469ee58469ee

And the chance of 0x0469ee58469ee58469ee58469ee58469ee58469ee58469ee having... something to do with mod/div 58 - is the first factor. Smiley

Something like that... Haven't had my fifth coffee yet today Wink


EDIT:
If this time I did not get it wrong again, it might go like this (python code):
Code:
>>> x = 0x00FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF / 58
>>> 1 / (1 - float(x - 58**31) / x)
23.337985918792462

Check out gocoin - my original project of full bitcoin node & cold wallet written in Go.
PGP fingerprint: AB9E A551 E262 A87A 13BB  9059 1BE7 B545 CDF3 FD0E
flatfly
Legendary
*
Offline Offline

Activity: 994


View Profile
November 24, 2013, 02:18:10 PM
 #9

I believe the proper calculation for the first factor comes from the value:

Code:
0x00FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF / 58 = 0x0469ee58469ee58469ee58469ee58469ee58469ee58469ee

And the chance of 0x0469ee58469ee58469ee58469ee58469ee58469ee58469ee having... something to do with mod/div 58 - is the first factor. Smiley

Something like that... Haven't had my fifth coffee yet today Wink


EDIT:
If this time I did not get it wrong again, it might go like this (python code):
Code:
>>> x = 0x00FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF / 58
>>> 1 / (1 - float(x - 58**31) / x)
23.337985918792462

That looks much better Smiley
However the next factors are much, MUCH higher than 58, it seems. I haven't been able to figure it out completely, though. This question is trickier than it seems...

1111127SpvabYpoeDoiz5L7QPkfiSh2Q. Only donate if you have a reason to.
piotr_n
Legendary
*
Offline Offline

Activity: 1778


aka tonikt


View Profile WWW
November 24, 2013, 04:58:53 PM
 #10

However the next factors are much, MUCH higher than 58, it seems. I haven't been able to figure it out completely, though. This question is trickier than it seems...
I think I know why.

It is the part of satoshi's EncodeBase58() function:
Code:
   // Leading zeroes encoded as base58 zeros
    for (const unsigned char* p = pbegin; p < pend && *p == 0; p++)
        str += pszBase58[0];

So for every zero-byte at the beginning, the encode function adds '1' in front of the string - effectively forcing the string to come out longer.

Using this function, I've made a simulation on 20 millions random addresses and indeed not even once we get a string shorter than 32 characters. The statistical results are like this:
Code:
34 chars: ~96.026255%
33 chars: ~3.973735%
32 chars: ~0.00001%

But when we modify the EncodeBase58 function to break the loop (after putting the first '1' in front), then after the 20 million rounds we get stats like this:
Code:
34 chars: 95.714475%
33 chars: 4.210685%
32 chars: 0.073435%
31 chars: 0.00139%
30 chars: 0.000015%

So in other words: if you want to replace trailing '11..' with single '1..' - then you get the expected factor 58, for the even shorter addresses.

And the bottom line is that EncodeBase58 function in the satoshi client is stupidly implemented, because there is no reason whatsoever for an address to start with more than one '1', and even if it does you can just remove the extra ones and the address will still work just fine.

Check out gocoin - my original project of full bitcoin node & cold wallet written in Go.
PGP fingerprint: AB9E A551 E262 A87A 13BB  9059 1BE7 B545 CDF3 FD0E
flatfly
Legendary
*
Offline Offline

Activity: 994


View Profile
November 24, 2013, 10:37:36 PM
 #11

However the next factors are much, MUCH higher than 58, it seems. I haven't been able to figure it out completely, though. This question is trickier than it seems...
I think I know why.

It is the part of satoshi's EncodeBase58() function:
Code:
   // Leading zeroes encoded as base58 zeros
    for (const unsigned char* p = pbegin; p < pend && *p == 0; p++)
        str += pszBase58[0];

So for every zero-byte at the beginning, the encode function adds '1' in front of the string - effectively forcing the string to come out longer.

Using this function, I've made a simulation on 20 millions random addresses and indeed not even once we get a string shorter than 32 characters. The statistical results are like this:
Code:
34 chars: ~96.026255%
33 chars: ~3.973735%
32 chars: ~0.00001%

But when we modify the EncodeBase58 function to break the loop (after putting the first '1' in front), then after the 20 million rounds we get stats like this:
Code:
34 chars: 95.714475%
33 chars: 4.210685%
32 chars: 0.073435%
31 chars: 0.00139%
30 chars: 0.000015%

So in other words: if you want to replace trailing '11..' with single '1..' - then you get the expected factor 58, for the even shorter addresses.

And the bottom line is that EncodeBase58 function in the satoshi client is stupidly implemented, because there is no reason whatsoever for an address to start with more than one '1', and even if it does you can just remove the extra ones and the address will still work just fine.

Very interesting. Thanks for running the tests! Perhaps the function was coded this way to avoid confusing newbies with address lengths that vary too much? Hard to be sure... but i can't think of another reason.

1111127SpvabYpoeDoiz5L7QPkfiSh2Q. Only donate if you have a reason to.
piotr_n
Legendary
*
Offline Offline

Activity: 1778


aka tonikt


View Profile WWW
November 25, 2013, 09:33:58 AM
 #12

Perhaps the function was coded this way to avoid confusing newbies with address lengths that vary too much? Hard to be sure... but i can't think of another reason.
I believe it is to assure that the specific implementation of the decode function will always return 25 bytes, since no padding is being done there.

BTW, in theory there are eight possible "valid" addresses with the length of 26 characters:
Code:
11111111111111111111BZbvjr
11111111111111111111HeBAGj
11111111111111111111QekFQw
11111111111111111111UpYBrS
11111111111111111111g4hiWR
11111111111111111111jGyPM8
11111111111111111111o9FmEC
11111111111111111111ufYVpS

And I was wrong again; removing any of the trailing '1' will cause the satoshi client to assume the address as "invalid", even though it is technically quite valid, since e.g. both; 1BZbvjr and 11111111111111111111BZbvjr represent exactly the same 200-bit number (which a bitcoin address essentially is).

But IMHO it is a flaw in the implementation - e.g. my client considers addresses with redundant trailing '1' characters are valid and equal.
Hell you can even skip the first '1' - the 32-bit checksum that si there anyway should be just fine to assure no typos.

Check out gocoin - my original project of full bitcoin node & cold wallet written in Go.
PGP fingerprint: AB9E A551 E262 A87A 13BB  9059 1BE7 B545 CDF3 FD0E
RoxxR
Full Member
***
Offline Offline

Activity: 178


View Profile
November 25, 2013, 10:03:30 AM
 #13

Thanks for this interesting discussion.

Code:
34 chars: ~96.026255%
33 chars: ~3.973735%
32 chars: ~0.00001%

So if I understand this right, I've got my answer:
32 characters are 400,000 times rarer than 33 characters.

Are 31 characters also 400,000 times rarer than 32 characters, or does the factor differ again?
piotr_n
Legendary
*
Offline Offline

Activity: 1778


aka tonikt


View Profile WWW
November 25, 2013, 11:37:08 AM
 #14

So if I understand this right, I've got my answer:
32 characters are 400,000 times rarer than 33 characters.
Statistically, but in this case, there where were only two 32-chars long addresses generated withing the 20 million random samples so the number is not very precise.
It gives you an idea about the magnitude, but the 400,000 might just as well be 150,000 or 1,000,000

So if I understand this right, I've got my answer:
Are 31 characters also 400,000 times rarer than 32 characters, or does the factor differ again?
That's a good question...
Unfortunately I don't know the answer nor have resources to run such a long simulation.
Though back in my high school (which was very long ago), if I had asked my math teacher, she would have probable known how to calculate it Smiley

Check out gocoin - my original project of full bitcoin node & cold wallet written in Go.
PGP fingerprint: AB9E A551 E262 A87A 13BB  9059 1BE7 B545 CDF3 FD0E
Pages: [1]
  Print  
 
Jump to:  

Sponsored by , a Bitcoin-accepting VPN.
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!