Bitcoin Forum

Bitcoin => Development & Technical Discussion => Topic started by: 🏰 TradeFortress 🏰 on September 23, 2014, 02:07:10 PM



Title: Statistical analysis of Bitcoin public key distribution
Post by: 🏰 TradeFortress 🏰 on September 23, 2014, 02:07:10 PM
Has anyone performed statistical analysis of the Bitcoin public keys on the blockchain? It might be useful for identifying if there may be a client with flawed key generation out in the wild.


Title: Re: Statistical analysis of Bitcoin public key distribution
Post by: amaclin on September 23, 2014, 02:14:29 PM
Statistics of public keys can not be converted into the statistics of private keys


Title: Re: Statistical analysis of Bitcoin public key distribution
Post by: Remember remember the 5th of November on September 23, 2014, 02:24:54 PM
Every single day people have bots running that identify transactions that are weird and spendable by anyone or weak brainwallets or re-used K value.


Title: Re: Statistical analysis of Bitcoin public key distribution
Post by: BurtW on September 24, 2014, 02:29:23 AM
I understand that the statistical distribution of public key values (points) tells us next to nothing about the statistical distribution of private keys however I am still curious as to whether a statistical analysis of the public key values has ever been done.

Even farther from the private key distribution:  has anyone ever done an analysis of the statistical distribution of the Bitcoin addresses?


Title: Re: Statistical analysis of Bitcoin public key distribution
Post by: amaclin on September 24, 2014, 04:55:58 AM
Quote
Even farther from the private key distribution:  has anyone ever done an analysis of the statistical distribution of the Bitcoin addresses?

What kind of statistics are you interested in?
Are you interested in the statistics of all possible addresses (2160)?
Or used in blockchain [~50 mln]? Or with non-zero balance [10 mln]?

There are a lot of vanity addresses and artificially created addresses (created not from privatekey->publickey->adderss sequence) in blockchain. I do not see any reason to calculate anything thom this garbage. But if you pay for it...


Title: Re: Statistical analysis of Bitcoin public key distribution
Post by: SoldierKid on September 24, 2014, 06:48:25 AM
My question -- how are you to obtain the public key of an address? Addresses aren't public keys, but the hashed/formatted version of public keys.

The only way to obtain public keys would be message signing afaik.

Of course, that's with the information I found here: https://en.bitcoin.it/wiki/Technical_background_of_version_1_Bitcoin_addresses


Title: Re: Statistical analysis of Bitcoin public key distribution
Post by: amaclin on September 24, 2014, 06:58:25 AM
My question -- how are you to obtain the public key of an address?
Yes, it is not possible by default.
But if the address has spendings (any number of outgoing transactions) the public key is in blockchain.

For example. Let us take the address 1JwSSubhmg6iPtRjtyqhUYYH7bZg3Lfy1T
Then take one of spendings from this address
https://blockchain.info/tx/c678a097402531ae2ecaec13bb5777338268af0a8fad3d399bd0ae2e64f2ad15
scroll down...
Do you see the long line "0478d430274f8c5ec1321338151e9f2..."
This is the public key for address 1JwSSubhmg6iPtRjtyqhUYYH7bZg3Lfy1T

Most of addresses have spendings. So, we can obtain millions of used public keys
This will be more that enough for statistical analysis


Title: Re: Statistical analysis of Bitcoin public key distribution
Post by: SoldierKid on September 24, 2014, 07:01:33 AM
My question -- how are you to obtain the public key of an address?
Yes, it is not possible by default.
But if the address has spendings (any number of outgoing transactions) the public key is in blockchain.

For example. Let us take the address 1JwSSubhmg6iPtRjtyqhUYYH7bZg3Lfy1T
Then take one of spendings from this address
https://blockchain.info/tx/c678a097402531ae2ecaec13bb5777338268af0a8fad3d399bd0ae2e64f2ad15
scroll down...
Do you see the long line "0478d430274f8c5ec1321338151e9f2..."
This is public key.

Most of addresses have spendings. So, we can obtain millions of used public keys


Ah, brilliant! Thanks for the information. So, technically, someone would be only able to analyze personally generated addresses or addresses that have sent coins. That's really neat to know.


Title: Re: Statistical analysis of Bitcoin public key distribution
Post by: amaclin on September 24, 2014, 07:04:01 AM
Quote
Ah, brilliant! Thanks for the information. So, technically, someone would be only able to analyze personally generated addresses or addresses that have sent coins. That's really neat to know.
Yes. And this is a reason (one of) not to re-use addresses.

UPD: there is no such thing as "personally generated addresses"
Address is either in blockchain or not exists  ;D


Title: Re: Statistical analysis of Bitcoin public key distribution
Post by: SoldierKid on September 24, 2014, 07:09:19 AM
UPD: there is no such thing as "personally generated addresses"
Address is either in blockchain or not exists  ;D

By personally generated addresses, I meant addresses that $username generated. The ones where he has the private key / public key pair. Sorry for not being very clear about that. I suppose it'd be a bit silly for him to include addresses that he already has the private key of.

Technically, I could generate a key pair and have a totally legit address and it not be part of the blockchain (offline wallet). Unless you're saying every address possible is already part of the blockchain.


Title: Re: Statistical analysis of Bitcoin public key distribution
Post by: amaclin on September 24, 2014, 07:13:45 AM
Quote
Technically, I could generate a key pair and have a totally legit address and it not be part of the blockchain (offline wallet). Unless you're saying every address possible is already part of the blockchain.

Yes. But in the very beginning of this topic the words were "... statistical analysis of the Bitcoin public keys on the blockchain ..."
Tell me what you want to achieve, if you perform analysis on the data you generated yourself?  ;D


Title: Re: Statistical analysis of Bitcoin public key distribution
Post by: SoldierKid on September 24, 2014, 07:19:42 AM
Quote
Technically, I could generate a key pair and have a totally legit address and it not be part of the blockchain (offline wallet). Unless you're saying every address possible is already part of the blockchain.

Yes. But in the very beginning of this topic the words were "... statistical analysis of the Bitcoin public keys on the blockchain ..."
Tell me what you want to achieve, if you perform analysis on the data you generated yourself?  ;D

Just a proof of concept of finding an address generation flaw with a few hundred thousand of addresses between each "flawed" generated address.

I think we've gotten pretty off-topic here, but I've learnt a lot about the blockchain and more information on bitcoin overall, so thanks a ton!


Title: Re: Statistical analysis of Bitcoin public key distribution
Post by: keystroke on September 25, 2014, 12:32:07 AM
That's funny, I was just wondering about this myself. I'd be interested to see the results of both address distribution and public key distribution.


Title: Re: Statistical analysis of Bitcoin public key distribution
Post by: Reynaldo on September 25, 2014, 01:18:11 AM
I dont know which algorithm it is using, if its gpg, should be totally random and nothing could be found if its not then the flaw might be from GPG and about the entropy that each system used to generate the key.

In theory you could generate an address that is already done but then the private key would not be the same, therefor you wouldnt be able to spend the funds.. correct me if im wrong.


Title: Re: Statistical analysis of Bitcoin public key distribution
Post by: DannyHamilton on September 25, 2014, 03:41:35 AM
I dont know which algorithm it is using,

Clearly.  Please explain how random guessing and tossing out words that you think might be associated with cryptography would be helpful at all?

if its gpg,

It's not.  The public key is ECDSA using the secp256k1 curve. The bitcoin address is a RIPEMD-160 hash of a SHA-256 hash.

should be totally random and nothing could be found if its not then the flaw might be from GPG and about the entropy that each system used to generate the key.

The entropy is in the choosing of the private key.  The questions being asked, as I understand them, are if a statistical analysis has been attempted to determine if there are any flaws in any of ECDSA, SHA-256, or RIPEMD-160 that might lead to discovery that either public keys or bitcoin addresses are less random than believed.

In theory you could generate an address that is already done but then the private key would not be the same, therefor you wouldnt be able to spend the funds.. correct me if im wrong.

You are wrong.  If you have a private key that generates a particular bitcoin address then you can spend all bitcoins that are received at that address, even if that private key isn't the same private key that was used by the intended recipient to originally generate the address.

(Note: The only way that you "could generate an address that is already done" with a different private key is if there is a flaw in ECDSA, SHA-256, or RIPEMD-160.  There are currently no known flaws that would cause this, and with nearly 6 years of use bitcoin has not ever had a recorded instance of this happening.)


Title: Re: Statistical analysis of Bitcoin public key distribution
Post by: Rupert on September 25, 2014, 03:04:26 PM
Maybe it is possible to use pub keys as random numbers? Maybe it is possible to use Bitcoin as random generator?


Title: Re: Statistical analysis of Bitcoin public key distribution
Post by: BurtW on September 25, 2014, 05:17:20 PM
I know this is silly so don't tell me it is silly.

I threw the code snippet together quickly so don't critique the code, it is for illustration purposes only.

I am willing to throw a small amount of BTC your way if you do this for me (I am too busy).  Please PM me with your bid and a super short description of what technology you would use (Java, C++, database, etc.)  to produce the result I want.

Here is what I want:

Scan through the entire blockchain and sort all the Bitcoin addresses found into "bins" by the first N letters.  The number of possible bins is related to N as follows:

Code:
//             Number of 
//    N    possible bins
//    -   --------------  
//    2               58
//    3            3,364
//    4          195,112
//    5       11,316,496
//    6      656,356,768
//    7   38,068,692,544

I think an N of 4 will be good as a first pass.  If this turns out to be interesting then an N of 5 might be interesting.

For each bin 1111 through 1zzz I want to know how many addresses exit in that bin and how many of those addresses have positive balances.  Here is a short code snippet to illustrate what I am looking for.  

Code:
#include <iostream>

using namespace std;

static const int N = 4;

//             Number of
//    N    possible bins
//    -   --------------  
//    2               58
//    3            3,364
//    4          195,112
//    5       11,316,496
//    6      656,356,768
//    7   38,068,692,544

char base58[] = "123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz";

int main (int argc, char *argv[])
{
    cout << argv[1] << endl;

    if (argv[1][0] == '1') {

        int bin = 0;

        for (int i=1; i<N; ++i) {

            for (int j=0; j<58; ++j) {

                if (argv[1][i] == base58[j]) {
                    bin = bin * 58 + j;
                    break;
                }
            }
        }

        cout << "Increment bin " << bin << endl;

        // Here you would increment AdddressExists[bin]

        // Here you would check to see if the address has a balance and if so
        // increment HasBalance[bin]

    } else {

        cout << "Did not start with 1" << endl;
    }

    return 0;
}

At the end of the program produce a CSV file something like this:

Code:
0,"1111",5,2
1,"1112",0,0
...
195110,"1zzy",1,0
195111,"1zzz",2,1


Title: Re: Statistical analysis of Bitcoin public key distribution
Post by: Undecidable on October 01, 2014, 09:54:26 PM
Here's a CSV through block 322933 https://mega.co.nz/#!cRVEUQJA!kkqcVwo6g47hHCA1IkA_JX2r7JHXC4iNBwkRVThZavs (https://mega.co.nz/#!cRVEUQJA!kkqcVwo6g47hHCA1IkA_JX2r7JHXC4iNBwkRVThZavs)

Edit: BTW, these stats only cover p2pkh (https://bitcoin.org/en/developer-guide#term-p2pkh)-type outputs.


Title: Re: Statistical analysis of Bitcoin public key distribution
Post by: BurtW on October 01, 2014, 11:07:28 PM
Here's a CSV through block 322933 https://mega.co.nz/#!cRVEUQJA!kkqcVwo6g47hHCA1IkA_JX2r7JHXC4iNBwkRVThZavs (https://mega.co.nz/#!cRVEUQJA!kkqcVwo6g47hHCA1IkA_JX2r7JHXC4iNBwkRVThZavs)

Edit: BTW, these stats only cover p2pkh (https://bitcoin.org/en/developer-guide#term-p2pkh)-type outputs.
THANKS!  I wonder about the data, perhapse there is a couple of bugs?

1)  The sum of the first column (number of address ever in each bin) is 48,297,867 and that seem right.  However the sum of the second column, the number of active addresses, is only 345,558.  That seems low to me.  That would indicate that all of the BTC in existence are stored on only 345,558 unique addresses?

2) This is a nit but there are only 195,111 rows.  The last row for bin 1zzz is missing.

I did a quick sort on the number of currently active addresses in each bin and the top ten are:

Code:
 
 Index    Bin    Ever   Now
------   ----   -----   ---
 32543   1Ag6     852   106
 32542   1Ag5     842    82
  9852   13vs     760    71
 32541   1Ag4     879    56
 23402   17xV     666    51
 32593   1Agx     837    48
   684   11Co      45    35
 35159   1BTC    1551    34
     0   1111     374    33
  1548   11Th      50    32

There does seem to be a lot of them in the 1Ag range but this is easily explained by Casascius because the way he created his coins skews the addresses into certain bins.  For example he created 1786 addresses starting with 1Ag in this batch of coins alone:

http://casascius.uberbills.com/?type=1&status=active

I also did a quick sort on the number of addresses ever in each bin and the top 20 are:

Code:
 
 Index    Bin    Ever   Now
------   ----   -----   ---
116804   1bit   12043     6
 35159   1BTC    1551    34
 68892   1MUo    1148     5
 36069   1Bit     892    27
 36712   1Buy     883     1
 32541   1Ag4     879    56
 32571   1Aga     869     4
 32572   1Agb     867     8
 32539   1Ag2     866     3
 32592   1Agw     863     7
 32538   1Ag1     859     7
 32595   1Agz     856     5
 32543   1Ag6     852   106
 32594   1Agy     846    25
 32542   1Ag5     842    82
 32593   1Agx     837    48
 75469   1PSC     834     9
 66967   1Luc     819    16
 37550   1CAR     813    13
 34187   1BAS     807    31

Again, vanity addresses obviously skew the distribution.


Title: Re: Statistical analysis of Bitcoin public key distribution
Post by: hhanh00 on October 02, 2014, 02:21:53 AM
I found different results.

Code:
addr  count      total balance
1BiT 11784 990.72856
1Btc 1015 2415.5241782
1MUo 614 169.74258679
1tip 588 9.21681564
1CaR 557 479.44438884
1Pay 511 502.95050021
1MaR 506 242.68727996
1AGa 496 835.41160235
1AgZ 463 429.46682356
1AGB 452 470.97131551
1Agw 443 515.40205199
1CHA 428 1142.89394438
1AGY 427 1293.95860474
1BaN 409 650.5419544
1CaN 402 435.30669108
1CAs 391 2208.6754669
1AgX 390 459.49381862
1BrA 367 229.65373024
...

For a total of 3.4 mil addresses


Title: Re: Statistical analysis of Bitcoin public key distribution
Post by: hhanh00 on October 02, 2014, 02:39:27 AM
Some other queries:
Top 10 richest addresses:

Code:
select address, balance / 1e8 from balances order by balance desc limit 1,10;
1i7cZdoE9NcHSdAL5eGjmTJbBVqeQDwgw 144341.53393243
13Df4x5nQo7boLWHxQCbJzobN5gUNT65Hh 134033.21518705
1FeexV6bAHb8ybZjqQMjJrcCrHGW9sb6uF 79957.05831838
1HQ3Go3ggs8pFnXuHVHRytPCq5fGG8Hbhx 69471.09995224
16ZbpCEyVVdqu8VycWR8thUL2Rd9JnjzHt 61693.55853673
1PB4xXUFyy4kSNqroCBVaQuCuw9VcN3be4 61544.25018469
1PnMfRF2enSZnR6JSexxBHuQnxG8Vo5FVK 61534.65308581
18f1yugoAJuXcHAbsuRVLQC9TezJ6iVRLp 61517.02586555
1DiHDQMPFu4p84rkLn6Majj2LCZZZRQUaa 61442.99572266
1AhTjUMztCihiTyA4K6E3QEpobjWLwKhkR 61387.07600804

Code:
select count(*) from balances;
3460367

The median is only 1 mbtc
Code:
select address, balance / 1e8 from balances order by balance asc limit 1730183,1;
1H6c1fDEzJBhvViHhtrW5L9YTuSfRNp44w 0.001

Total balance
Code:
select sum(balance) / 1e8 from balances;
13316310.28675084

The top 18 (0.0005 %) addresses own 10% of the total
Code:
select sum(balance) / 1e8 from (select balance from balances  order by balance desc limit 18) as T;
1345428.14052543

The top 3000 (0.09%) own more than half
Code:
select sum(balance) / 1e8 from (select balance from balances  order by balance desc limit 3000) as T;
6820871.23187743



Title: Re: Statistical analysis of Bitcoin public key distribution
Post by: Reynaldo on October 02, 2014, 03:20:48 AM
I dont know which algorithm it is using,

Clearly.  Please explain how random guessing and tossing out words that you think might be associated with cryptography would be helpful at all?

if its gpg,

It's not.  The public key is ECDSA using the secp256k1 curve. The bitcoin address is a RIPEMD-160 hash of a SHA-256 hash.

should be totally random and nothing could be found if its not then the flaw might be from GPG and about the entropy that each system used to generate the key.

The entropy is in the choosing of the private key.  The questions being asked, as I understand them, are if a statistical analysis has been attempted to determine if there are any flaws in any of ECDSA, SHA-256, or RIPEMD-160 that might lead to discovery that either public keys or bitcoin addresses are less random than believed.

In theory you could generate an address that is already done but then the private key would not be the same, therefor you wouldnt be able to spend the funds.. correct me if im wrong.

You are wrong.  If you have a private key that generates a particular bitcoin address then you can spend all bitcoins that are received at that address, even if that private key isn't the same private key that was used by the intended recipient to originally generate the address.

(Note: The only way that you "could generate an address that is already done" with a different private key is if there is a flaw in ECDSA, SHA-256, or RIPEMD-160.  There are currently no known flaws that would cause this, and with nearly 6 years of use bitcoin has not ever had a recorded instance of this happening.)

Thanks alot, this clears alot of things to me :D


Title: Re: Statistical analysis of Bitcoin public key distribution
Post by: Phrenico on October 02, 2014, 03:34:56 PM

You are wrong.  If you have a private key that generates a particular bitcoin address then you can spend all bitcoins that are received at that address, even if that private key isn't the same private key that was used by the intended recipient to originally generate the address.

(Note: The only way that you "could generate an address that is already done" with a different private key is if there is a flaw in ECDSA, SHA-256, or RIPEMD-160.  There are currently no known flaws that would cause this, and with nearly 6 years of use bitcoin has not ever had a recorded instance of this happening.)

Correct my misunderstanding then:

Private keys are intended to be one-to-one with public keys, so that would certainly be a flaw in ECDSA if two private keys correspond to one public key, but since you turn the 256 bit public key into a 160 bit digest, it would just be incredibly unlikely, not impossible, for an ideal hash function to map two different inputs 256 bit to a single 160 bit output.

What's wrong with my understanding?


Title: Re: Statistical analysis of Bitcoin public key distribution
Post by: hhanh00 on October 02, 2014, 03:51:30 PM
1. The hash function maps billions of 256 bits input to the same value.  It's just that it's not computably easy to find a collision.
2. Bitcoin verifies that you can spend from an output by checking that the public key that you provide has the same hash the address and that you have the private key.
In other words, when the money was sent out, only the hash was given. To claim the money, you show the public key.


Title: Re: Statistical analysis of Bitcoin public key distribution
Post by: DannyHamilton on October 02, 2014, 04:04:16 PM
Correct my misunderstanding then:

Private keys are intended to be one-to-one with public keys, so that would certainly be a flaw in ECDSA if two private keys correspond to one public key, but since you turn the 256 bit public key into a 160 bit digest, it would just be incredibly unlikely, not impossible, for an ideal hash function to map two different inputs 256 bit to a single 160 bit output.

What's wrong with my understanding?

There is nothing wrong with your understanding.

When I say:
the only way you "could" do something

I mean it in the same way as when I say:
"The only way all the air molecules in the room could spontaneously collect in one corner instantly suffocating everyone in the room is if there is a flaw in our current understanding of physics."

Sure, there is an "incredibly unlikely, not impossible" probability of it happening, but in terms that the typical human being understands it can't happen.

Meanwhile when you say "not impossible", you are simply using a non-zero probability number (regardless of how extremely small that number is) as an indication of "possibility".


Title: Re: Statistical analysis of Bitcoin public key distribution
Post by: amaclin on October 02, 2014, 04:05:23 PM
Quote
Private keys are intended to be one-to-one with public keys, so that would certainly be a flaw in ECDSA if two private keys correspond to one public key
It will be flaw in basic arithmetic, not in ECDSA.
X * 7 = 28
How many values of X do you know? Two? Give me both, please!

Quote
but since you turn the 256 bit public key into a 160 bit digest, it would just be incredibly unlikely, not impossible, for an ideal hash function to map two different inputs 256 bit to a single 160 bit output.

What's wrong with my understanding?
That is correct. There is 296 addresses for each private key. No problem.

UPD: sorry, this sentence has mistake. see replies below.


Title: Re: Statistical analysis of Bitcoin public key distribution
Post by: hhanh00 on October 02, 2014, 04:21:02 PM
Quote
Private keys are intended to be one-to-one with public keys, so that would certainly be a flaw in ECDSA if two private keys correspond to one public key
It will be flaw in basic arithmetic, not in ECDSA.
X * 7 = 28
How many values of X do you know? Two? Give me both, please!
Elliptic curve mathematics is much more complicated than this. Besides, there are technically multiple values because the EC forms a cyclic group but we choose the normalized value. anyway Why the sarcastic comment?


Title: Re: Statistical analysis of Bitcoin public key distribution
Post by: BurtW on October 02, 2014, 08:11:53 PM
Quote
Private keys are intended to be one-to-one with public keys, so that would certainly be a flaw in ECDSA if two private keys correspond to one public key
It will be flaw in basic arithmetic, not in ECDSA.
X * 7 = 28
How many values of X do you know? Two? Give me both, please!

Quote
but since you turn the 256 bit public key into a 160 bit digest, it would just be incredibly unlikely, not impossible, for an ideal hash function to map two different inputs 256 bit to a single 160 bit output.

What's wrong with my understanding?
That is correct. There is 296 addresses for each private key. No problem.

No.  On average there approximately 296 key pairs mapped to each Bitcoin address.

Your statement has it backwards.

To spend the BTC at a Bitcoin address you need to have one of the approximately 296 possible key pairs that will properly map to that Bitcoin address.


Title: Re: Statistical analysis of Bitcoin public key distribution
Post by: amaclin on October 02, 2014, 08:57:41 PM
Quote
No.  On average there approximately 296 key pairs mapped to each Bitcoin address.
Your statement has it backwards.

Thank you. I am very sorry. Thinking in English is quite hard for me. Thanks for correcting my mistake