Bitcoin Forum
May 04, 2024, 12:49:55 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1] 2 »  All
  Print  
Author Topic: Statistical analysis of Bitcoin public key distribution  (Read 4816 times)
🏰 TradeFortress 🏰 (OP)
Bitcoin Veteran
VIP
Legendary
*
Offline Offline

Activity: 1316
Merit: 1043

👻


View Profile
September 23, 2014, 02:07:10 PM
 #1

Has anyone performed statistical analysis of the Bitcoin public keys on the blockchain? It might be useful for identifying if there may be a client with flawed key generation out in the wild.
No Gods or Kings. Only Bitcoin
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
amaclin
Legendary
*
Offline Offline

Activity: 1260
Merit: 1019


View Profile
September 23, 2014, 02:14:29 PM
 #2

Statistics of public keys can not be converted into the statistics of private keys
Remember remember the 5th of November
Legendary
*
Offline Offline

Activity: 1862
Merit: 1011

Reverse engineer from time to time


View Profile
September 23, 2014, 02:24:54 PM
 #3

Every single day people have bots running that identify transactions that are weird and spendable by anyone or weak brainwallets or re-used K value.

BTC:1AiCRMxgf1ptVQwx6hDuKMu4f7F27QmJC2
BurtW
Legendary
*
Offline Offline

Activity: 2646
Merit: 1131

All paid signature campaigns should be banned.


View Profile WWW
September 24, 2014, 02:29:23 AM
 #4

I understand that the statistical distribution of public key values (points) tells us next to nothing about the statistical distribution of private keys however I am still curious as to whether a statistical analysis of the public key values has ever been done.

Even farther from the private key distribution:  has anyone ever done an analysis of the statistical distribution of the Bitcoin addresses?

Our family was terrorized by Homeland Security.  Read all about it here:  http://www.jmwagner.com/ and http://www.burtw.com/  Any donations to help us recover from the $300,000 in legal fees and forced donations to the Federal Asset Forfeiture slush fund are greatly appreciated!
amaclin
Legendary
*
Offline Offline

Activity: 1260
Merit: 1019


View Profile
September 24, 2014, 04:55:58 AM
 #5

Quote
Even farther from the private key distribution:  has anyone ever done an analysis of the statistical distribution of the Bitcoin addresses?

What kind of statistics are you interested in?
Are you interested in the statistics of all possible addresses (2160)?
Or used in blockchain [~50 mln]? Or with non-zero balance [10 mln]?

There are a lot of vanity addresses and artificially created addresses (created not from privatekey->publickey->adderss sequence) in blockchain. I do not see any reason to calculate anything thom this garbage. But if you pay for it...
SoldierKid
Newbie
*
Offline Offline

Activity: 14
Merit: 0


View Profile
September 24, 2014, 06:48:25 AM
 #6

My question -- how are you to obtain the public key of an address? Addresses aren't public keys, but the hashed/formatted version of public keys.

The only way to obtain public keys would be message signing afaik.

Of course, that's with the information I found here: https://en.bitcoin.it/wiki/Technical_background_of_version_1_Bitcoin_addresses
amaclin
Legendary
*
Offline Offline

Activity: 1260
Merit: 1019


View Profile
September 24, 2014, 06:58:25 AM
 #7

My question -- how are you to obtain the public key of an address?
Yes, it is not possible by default.
But if the address has spendings (any number of outgoing transactions) the public key is in blockchain.

For example. Let us take the address 1JwSSubhmg6iPtRjtyqhUYYH7bZg3Lfy1T
Then take one of spendings from this address
https://blockchain.info/tx/c678a097402531ae2ecaec13bb5777338268af0a8fad3d399bd0ae2e64f2ad15
scroll down...
Do you see the long line "0478d430274f8c5ec1321338151e9f2..."
This is the public key for address 1JwSSubhmg6iPtRjtyqhUYYH7bZg3Lfy1T

Most of addresses have spendings. So, we can obtain millions of used public keys
This will be more that enough for statistical analysis
SoldierKid
Newbie
*
Offline Offline

Activity: 14
Merit: 0


View Profile
September 24, 2014, 07:01:33 AM
 #8

My question -- how are you to obtain the public key of an address?
Yes, it is not possible by default.
But if the address has spendings (any number of outgoing transactions) the public key is in blockchain.

For example. Let us take the address 1JwSSubhmg6iPtRjtyqhUYYH7bZg3Lfy1T
Then take one of spendings from this address
https://blockchain.info/tx/c678a097402531ae2ecaec13bb5777338268af0a8fad3d399bd0ae2e64f2ad15
scroll down...
Do you see the long line "0478d430274f8c5ec1321338151e9f2..."
This is public key.

Most of addresses have spendings. So, we can obtain millions of used public keys


Ah, brilliant! Thanks for the information. So, technically, someone would be only able to analyze personally generated addresses or addresses that have sent coins. That's really neat to know.
amaclin
Legendary
*
Offline Offline

Activity: 1260
Merit: 1019


View Profile
September 24, 2014, 07:04:01 AM
 #9

Quote
Ah, brilliant! Thanks for the information. So, technically, someone would be only able to analyze personally generated addresses or addresses that have sent coins. That's really neat to know.
Yes. And this is a reason (one of) not to re-use addresses.

UPD: there is no such thing as "personally generated addresses"
Address is either in blockchain or not exists  Grin
SoldierKid
Newbie
*
Offline Offline

Activity: 14
Merit: 0


View Profile
September 24, 2014, 07:09:19 AM
 #10

UPD: there is no such thing as "personally generated addresses"
Address is either in blockchain or not exists  Grin

By personally generated addresses, I meant addresses that $username generated. The ones where he has the private key / public key pair. Sorry for not being very clear about that. I suppose it'd be a bit silly for him to include addresses that he already has the private key of.

Technically, I could generate a key pair and have a totally legit address and it not be part of the blockchain (offline wallet). Unless you're saying every address possible is already part of the blockchain.
amaclin
Legendary
*
Offline Offline

Activity: 1260
Merit: 1019


View Profile
September 24, 2014, 07:13:45 AM
 #11

Quote
Technically, I could generate a key pair and have a totally legit address and it not be part of the blockchain (offline wallet). Unless you're saying every address possible is already part of the blockchain.

Yes. But in the very beginning of this topic the words were "... statistical analysis of the Bitcoin public keys on the blockchain ..."
Tell me what you want to achieve, if you perform analysis on the data you generated yourself?  Grin
SoldierKid
Newbie
*
Offline Offline

Activity: 14
Merit: 0


View Profile
September 24, 2014, 07:19:42 AM
 #12

Quote
Technically, I could generate a key pair and have a totally legit address and it not be part of the blockchain (offline wallet). Unless you're saying every address possible is already part of the blockchain.

Yes. But in the very beginning of this topic the words were "... statistical analysis of the Bitcoin public keys on the blockchain ..."
Tell me what you want to achieve, if you perform analysis on the data you generated yourself?  Grin

Just a proof of concept of finding an address generation flaw with a few hundred thousand of addresses between each "flawed" generated address.

I think we've gotten pretty off-topic here, but I've learnt a lot about the blockchain and more information on bitcoin overall, so thanks a ton!
keystroke
Hero Member
*****
Offline Offline

Activity: 900
Merit: 1014


advocate of a cryptographic attack on the globe


View Profile
September 25, 2014, 12:32:07 AM
 #13

That's funny, I was just wondering about this myself. I'd be interested to see the results of both address distribution and public key distribution.

"The difference between a castle and a prison is only a question of who holds the keys."
Reynaldo
Legendary
*
Offline Offline

Activity: 1143
Merit: 1000


View Profile
September 25, 2014, 01:18:11 AM
 #14

I dont know which algorithm it is using, if its gpg, should be totally random and nothing could be found if its not then the flaw might be from GPG and about the entropy that each system used to generate the key.

In theory you could generate an address that is already done but then the private key would not be the same, therefor you wouldnt be able to spend the funds.. correct me if im wrong.
DannyHamilton
Legendary
*
Offline Offline

Activity: 3388
Merit: 4615



View Profile
September 25, 2014, 03:41:35 AM
 #15

I dont know which algorithm it is using,

Clearly.  Please explain how random guessing and tossing out words that you think might be associated with cryptography would be helpful at all?

if its gpg,

It's not.  The public key is ECDSA using the secp256k1 curve. The bitcoin address is a RIPEMD-160 hash of a SHA-256 hash.

should be totally random and nothing could be found if its not then the flaw might be from GPG and about the entropy that each system used to generate the key.

The entropy is in the choosing of the private key.  The questions being asked, as I understand them, are if a statistical analysis has been attempted to determine if there are any flaws in any of ECDSA, SHA-256, or RIPEMD-160 that might lead to discovery that either public keys or bitcoin addresses are less random than believed.

In theory you could generate an address that is already done but then the private key would not be the same, therefor you wouldnt be able to spend the funds.. correct me if im wrong.

You are wrong.  If you have a private key that generates a particular bitcoin address then you can spend all bitcoins that are received at that address, even if that private key isn't the same private key that was used by the intended recipient to originally generate the address.

(Note: The only way that you "could generate an address that is already done" with a different private key is if there is a flaw in ECDSA, SHA-256, or RIPEMD-160.  There are currently no known flaws that would cause this, and with nearly 6 years of use bitcoin has not ever had a recorded instance of this happening.)
Rupert
Newbie
*
Offline Offline

Activity: 17
Merit: 0


View Profile
September 25, 2014, 03:04:26 PM
 #16

Maybe it is possible to use pub keys as random numbers? Maybe it is possible to use Bitcoin as random generator?
BurtW
Legendary
*
Offline Offline

Activity: 2646
Merit: 1131

All paid signature campaigns should be banned.


View Profile WWW
September 25, 2014, 05:17:20 PM
 #17

I know this is silly so don't tell me it is silly.

I threw the code snippet together quickly so don't critique the code, it is for illustration purposes only.

I am willing to throw a small amount of BTC your way if you do this for me (I am too busy).  Please PM me with your bid and a super short description of what technology you would use (Java, C++, database, etc.)  to produce the result I want.

Here is what I want:

Scan through the entire blockchain and sort all the Bitcoin addresses found into "bins" by the first N letters.  The number of possible bins is related to N as follows:

Code:
//             Number of 
//    N    possible bins
//    -   --------------  
//    2               58
//    3            3,364
//    4          195,112
//    5       11,316,496
//    6      656,356,768
//    7   38,068,692,544

I think an N of 4 will be good as a first pass.  If this turns out to be interesting then an N of 5 might be interesting.

For each bin 1111 through 1zzz I want to know how many addresses exit in that bin and how many of those addresses have positive balances.  Here is a short code snippet to illustrate what I am looking for.  

Code:
#include <iostream>

using namespace std;

static const int N = 4;

//             Number of
//    N    possible bins
//    -   --------------  
//    2               58
//    3            3,364
//    4          195,112
//    5       11,316,496
//    6      656,356,768
//    7   38,068,692,544

char base58[] = "123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz";

int main (int argc, char *argv[])
{
    cout << argv[1] << endl;

    if (argv[1][0] == '1') {

        int bin = 0;

        for (int i=1; i<N; ++i) {

            for (int j=0; j<58; ++j) {

                if (argv[1][i] == base58[j]) {
                    bin = bin * 58 + j;
                    break;
                }
            }
        }

        cout << "Increment bin " << bin << endl;

        // Here you would increment AdddressExists[bin]

        // Here you would check to see if the address has a balance and if so
        // increment HasBalance[bin]

    } else {

        cout << "Did not start with 1" << endl;
    }

    return 0;
}

At the end of the program produce a CSV file something like this:

Code:
0,"1111",5,2
1,"1112",0,0
...
195110,"1zzy",1,0
195111,"1zzz",2,1

Our family was terrorized by Homeland Security.  Read all about it here:  http://www.jmwagner.com/ and http://www.burtw.com/  Any donations to help us recover from the $300,000 in legal fees and forced donations to the Federal Asset Forfeiture slush fund are greatly appreciated!
Undecidable
Newbie
*
Offline Offline

Activity: 16
Merit: 0


View Profile
October 01, 2014, 09:54:26 PM
Last edit: October 01, 2014, 10:08:01 PM by Undecidable
 #18

Here's a CSV through block 322933 https://mega.co.nz/#!cRVEUQJA!kkqcVwo6g47hHCA1IkA_JX2r7JHXC4iNBwkRVThZavs

Edit: BTW, these stats only cover p2pkh-type outputs.
BurtW
Legendary
*
Offline Offline

Activity: 2646
Merit: 1131

All paid signature campaigns should be banned.


View Profile WWW
October 01, 2014, 11:07:28 PM
 #19

Here's a CSV through block 322933 https://mega.co.nz/#!cRVEUQJA!kkqcVwo6g47hHCA1IkA_JX2r7JHXC4iNBwkRVThZavs

Edit: BTW, these stats only cover p2pkh-type outputs.
THANKS!  I wonder about the data, perhapse there is a couple of bugs?

1)  The sum of the first column (number of address ever in each bin) is 48,297,867 and that seem right.  However the sum of the second column, the number of active addresses, is only 345,558.  That seems low to me.  That would indicate that all of the BTC in existence are stored on only 345,558 unique addresses?

2) This is a nit but there are only 195,111 rows.  The last row for bin 1zzz is missing.

I did a quick sort on the number of currently active addresses in each bin and the top ten are:

Code:
 
 Index    Bin    Ever   Now
------   ----   -----   ---
 32543   1Ag6     852   106
 32542   1Ag5     842    82
  9852   13vs     760    71
 32541   1Ag4     879    56
 23402   17xV     666    51
 32593   1Agx     837    48
   684   11Co      45    35
 35159   1BTC    1551    34
     0   1111     374    33
  1548   11Th      50    32

There does seem to be a lot of them in the 1Ag range but this is easily explained by Casascius because the way he created his coins skews the addresses into certain bins.  For example he created 1786 addresses starting with 1Ag in this batch of coins alone:

http://casascius.uberbills.com/?type=1&status=active

I also did a quick sort on the number of addresses ever in each bin and the top 20 are:

Code:
 
 Index    Bin    Ever   Now
------   ----   -----   ---
116804   1bit   12043     6
 35159   1BTC    1551    34
 68892   1MUo    1148     5
 36069   1Bit     892    27
 36712   1Buy     883     1
 32541   1Ag4     879    56
 32571   1Aga     869     4
 32572   1Agb     867     8
 32539   1Ag2     866     3
 32592   1Agw     863     7
 32538   1Ag1     859     7
 32595   1Agz     856     5
 32543   1Ag6     852   106
 32594   1Agy     846    25
 32542   1Ag5     842    82
 32593   1Agx     837    48
 75469   1PSC     834     9
 66967   1Luc     819    16
 37550   1CAR     813    13
 34187   1BAS     807    31

Again, vanity addresses obviously skew the distribution.

Our family was terrorized by Homeland Security.  Read all about it here:  http://www.jmwagner.com/ and http://www.burtw.com/  Any donations to help us recover from the $300,000 in legal fees and forced donations to the Federal Asset Forfeiture slush fund are greatly appreciated!
hhanh00
Sr. Member
****
Offline Offline

Activity: 467
Merit: 266


View Profile
October 02, 2014, 02:21:53 AM
 #20

I found different results.

Code:
addr  count      total balance
1BiT 11784 990.72856
1Btc 1015 2415.5241782
1MUo 614 169.74258679
1tip 588 9.21681564
1CaR 557 479.44438884
1Pay 511 502.95050021
1MaR 506 242.68727996
1AGa 496 835.41160235
1AgZ 463 429.46682356
1AGB 452 470.97131551
1Agw 443 515.40205199
1CHA 428 1142.89394438
1AGY 427 1293.95860474
1BaN 409 650.5419544
1CaN 402 435.30669108
1CAs 391 2208.6754669
1AgX 390 459.49381862
1BrA 367 229.65373024
...

For a total of 3.4 mil addresses

Pages: [1] 2 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!