Bitcoin Forum

Bitcoin => Development & Technical Discussion => Topic started by: RoxxR on November 23, 2013, 11:13:00 AM



Title: How much rarer are short bitcoin addresses?
Post by: RoxxR on November 23, 2013, 11:13:00 AM
I'm not very good at math... but I have noticed that 33-character addresses are about 20 times less common than 34-character addresses. Similarly, are 32-character addresses 20 times less common than 33-character addresses (and so on)?

I would really like to see a math guru confirm this.


Title: Re: How much rarer are short bitcoin addresses?
Post by: michagogo on November 23, 2013, 04:44:52 PM
http://bitcoin.stackexchange.com/questions/2013/why-does-the-length-of-a-bitcoin-key-vary


Title: Re: How much rarer are short bitcoin addresses?
Post by: BurtW on November 23, 2013, 04:56:09 PM
http://bitcoin.stackexchange.com/questions/2013/why-does-the-length-of-a-bitcoin-key-vary

That does not answer the question about the distribution of Bitcoin address lengths.  I do remember seeing a post about that a long time ago.  Try the search function to see if you can find it.


Title: Re: How much rarer are short bitcoin addresses?
Post by: cr1776 on November 24, 2013, 07:31:18 AM
They just vary due to encoding, so (somewhat simplified) if you had two:
12345678
00098765  could be written as 98765

That doesn't quantify it, but this explains:

Quote
Most Bitcoin addresses are 34 characters. They consist of random digits and uppercase and lowercase letters, with the exception that the uppercase letter "O", uppercase letter "I", lowercase letter "l", and the number "0" are never used to prevent visual ambiguity.
Some Bitcoin addresses can be shorter than 34 characters (as few as 27 in theory) and still be valid. A significant percentage of Bitcoin addresses are only 33 characters, and some addresses may be even shorter. Every Bitcoin address stands for a number - somewhat like an account number. These shorter addresses are valid simply because they stand for numbers that happen to start with zeroes, and when the zeroes are omitted, the encoded address gets shorter.
Several of the characters inside a Bitcoin address are used as a checksum so that typographical errors can be automatically found and rejected. The checksum also allows Bitcoin software to confirm that a 33-character (or shorter) address is in fact valid and isn't simply an address with a missing character.



I'm not very good at math... but I have noticed that 33-character addresses are about 20 times less common than 34-character addresses. Similarly, are 32-character addresses 20 times less common than 33-character addresses (and so on)?

I would really like to see a math guru confirm this.


Title: Re: How much rarer are short bitcoin addresses?
Post by: piotr_n on November 24, 2013, 09:56:58 AM
I'm not very good at math... but I have noticed that 33-character addresses are about 20 times less common than 34-character addresses. Similarly, are 32-character addresses 20 times less common than 33-character addresses (and so on)?

I would really like to see a math guru confirm this.

It's as you say, except that the factor is 58, not 20.
Each one character shorter address should be 58 times rarer...


Title: Re: How much rarer are short bitcoin addresses?
Post by: flatfly on November 24, 2013, 10:30:07 AM
I'm not very good at math... but I have noticed that 33-character addresses are about 20 times less common than 34-character addresses. Similarly, are 32-character addresses 20 times less common than 33-character addresses (and so on)?

I would really like to see a math guru confirm this.

It's as you say, except that the factor is 58, not 20.
Each one character shorter address should be 58 times rarer...

Nope, the factor is not 58. I haven't done the math (not enough time for this now) but from my quick initial testing, the factor seems to be roughly 22.5, at least for 33-character addresses. Don't forget to take leading zeros into account.


Title: Re: How much rarer are short bitcoin addresses?
Post by: piotr_n on November 24, 2013, 11:05:17 AM
I'm not very good at math... but I have noticed that 33-character addresses are about 20 times less common than 34-character addresses. Similarly, are 32-character addresses 20 times less common than 33-character addresses (and so on)?

I would really like to see a math guru confirm this.

It's as you say, except that the factor is 58, not 20.
Each one character shorter address should be 58 times rarer...

Nope, the factor is not 58. I haven't done the math (not enough time for this now) but from my quick initial testing, the factor seems to be roughly 22.5, at least for 33-character addresses. Don't forget to take leading zeros into account.

You're right - sorry, my mistake.
The first byte is always zero, which means that the first factor is lower.
13.140625, if I did the math right... (58*58/256)
But any further shorter ones should use the factor 58 / char.


Title: Re: How much rarer are short bitcoin addresses?
Post by: piotr_n on November 24, 2013, 11:24:28 AM
I believe the proper calculation for the first factor comes from the value:

Code:
0x00FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF / 58 = 0x0469ee58469ee58469ee58469ee58469ee58469ee58469ee

And the chance of 0x0469ee58469ee58469ee58469ee58469ee58469ee58469ee having... something to do with mod/div 58 - is the first factor. :)

Something like that... Haven't had my fifth coffee yet today ;)


EDIT:
If this time I did not get it wrong again, it might go like this (python code):
Code:
>>> x = 0x00FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF / 58
>>> 1 / (1 - float(x - 58**31) / x)
23.337985918792462


Title: Re: How much rarer are short bitcoin addresses?
Post by: flatfly on November 24, 2013, 02:18:10 PM
I believe the proper calculation for the first factor comes from the value:

Code:
0x00FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF / 58 = 0x0469ee58469ee58469ee58469ee58469ee58469ee58469ee

And the chance of 0x0469ee58469ee58469ee58469ee58469ee58469ee58469ee having... something to do with mod/div 58 - is the first factor. :)

Something like that... Haven't had my fifth coffee yet today ;)


EDIT:
If this time I did not get it wrong again, it might go like this (python code):
Code:
>>> x = 0x00FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF / 58
>>> 1 / (1 - float(x - 58**31) / x)
23.337985918792462

That looks much better :)
However the next factors are much, MUCH higher than 58, it seems. I haven't been able to figure it out completely, though. This question is trickier than it seems...


Title: Re: How much rarer are short bitcoin addresses?
Post by: piotr_n on November 24, 2013, 04:58:53 PM
However the next factors are much, MUCH higher than 58, it seems. I haven't been able to figure it out completely, though. This question is trickier than it seems...
I think I know why.

It is the part of satoshi's EncodeBase58() function:
Code:
    // Leading zeroes encoded as base58 zeros
    for (const unsigned char* p = pbegin; p < pend && *p == 0; p++)
        str += pszBase58[0];

So for every zero-byte at the beginning, the encode function adds '1' in front of the string - effectively forcing the string to come out longer.

Using this function, I've made a simulation on 20 millions random addresses and indeed not even once we get a string shorter than 32 characters. The statistical results are like this:
Code:
34 chars: ~96.026255%
33 chars: ~3.973735%
32 chars: ~0.00001%

But when we modify the EncodeBase58 function to break the loop (after putting the first '1' in front), then after the 20 million rounds we get stats like this:
Code:
34 chars: 95.714475%
33 chars: 4.210685%
32 chars: 0.073435%
31 chars: 0.00139%
30 chars: 0.000015%

So in other words: if you want to replace trailing '11..' with single '1..' - then you get the expected factor 58, for the even shorter addresses.

And the bottom line is that EncodeBase58 function in the satoshi client is stupidly implemented, because there is no reason whatsoever for an address to start with more than one '1', and even if it does you can just remove the extra ones and the address will still work just fine.


Title: Re: How much rarer are short bitcoin addresses?
Post by: flatfly on November 24, 2013, 10:37:36 PM
However the next factors are much, MUCH higher than 58, it seems. I haven't been able to figure it out completely, though. This question is trickier than it seems...
I think I know why.

It is the part of satoshi's EncodeBase58() function:
Code:
    // Leading zeroes encoded as base58 zeros
    for (const unsigned char* p = pbegin; p < pend && *p == 0; p++)
        str += pszBase58[0];

So for every zero-byte at the beginning, the encode function adds '1' in front of the string - effectively forcing the string to come out longer.

Using this function, I've made a simulation on 20 millions random addresses and indeed not even once we get a string shorter than 32 characters. The statistical results are like this:
Code:
34 chars: ~96.026255%
33 chars: ~3.973735%
32 chars: ~0.00001%

But when we modify the EncodeBase58 function to break the loop (after putting the first '1' in front), then after the 20 million rounds we get stats like this:
Code:
34 chars: 95.714475%
33 chars: 4.210685%
32 chars: 0.073435%
31 chars: 0.00139%
30 chars: 0.000015%

So in other words: if you want to replace trailing '11..' with single '1..' - then you get the expected factor 58, for the even shorter addresses.

And the bottom line is that EncodeBase58 function in the satoshi client is stupidly implemented, because there is no reason whatsoever for an address to start with more than one '1', and even if it does you can just remove the extra ones and the address will still work just fine.

Very interesting. Thanks for running the tests! Perhaps the function was coded this way to avoid confusing newbies with address lengths that vary too much? Hard to be sure... but i can't think of another reason.


Title: Re: How much rarer are short bitcoin addresses?
Post by: piotr_n on November 25, 2013, 09:33:58 AM
Perhaps the function was coded this way to avoid confusing newbies with address lengths that vary too much? Hard to be sure... but i can't think of another reason.
I believe it is to assure that the specific implementation of the decode function will always return 25 bytes, since no padding is being done there.

BTW, in theory there are eight possible "valid" addresses with the length of 26 characters:
Code:
11111111111111111111BZbvjr
11111111111111111111HeBAGj
11111111111111111111QekFQw
11111111111111111111UpYBrS
11111111111111111111g4hiWR
11111111111111111111jGyPM8
11111111111111111111o9FmEC
11111111111111111111ufYVpS

And I was wrong again; removing any of the trailing '1' will cause the satoshi client to assume the address as "invalid", even though it is technically quite valid, since e.g. both; 1BZbvjr and 11111111111111111111BZbvjr represent exactly the same 200-bit number (which a bitcoin address essentially is).

But IMHO it is a flaw in the implementation - e.g. my client considers addresses with redundant trailing '1' characters are valid and equal.
Hell you can even skip the first '1' - the 32-bit checksum that si there anyway should be just fine to assure no typos.


Title: Re: How much rarer are short bitcoin addresses?
Post by: RoxxR on November 25, 2013, 10:03:30 AM
Thanks for this interesting discussion.

Code:
34 chars: ~96.026255%
33 chars: ~3.973735%
32 chars: ~0.00001%

So if I understand this right, I've got my answer:
32 characters are 400,000 times rarer than 33 characters.

Are 31 characters also 400,000 times rarer than 32 characters, or does the factor differ again?


Title: Re: How much rarer are short bitcoin addresses?
Post by: piotr_n on November 25, 2013, 11:37:08 AM
So if I understand this right, I've got my answer:
32 characters are 400,000 times rarer than 33 characters.
Statistically, but in this case, there where were only two 32-chars long addresses generated withing the 20 million random samples so the number is not very precise.
It gives you an idea about the magnitude, but the 400,000 might just as well be 150,000 or 1,000,000

So if I understand this right, I've got my answer:
Are 31 characters also 400,000 times rarer than 32 characters, or does the factor differ again?
That's a good question...
Unfortunately I don't know the answer nor have resources to run such a long simulation.
Though back in my high school (which was very long ago), if I had asked my math teacher, she would have probable known how to calculate it :)