Bitcoin Forum
March 17, 2026, 10:14:39 PM *
News: Latest Bitcoin Core release: 30.2 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: Largest databases of pairs of private key and P2PKH addresses  (Read 198 times)
Tfs (OP)
Newbie
*
Offline Offline

Activity: 29
Merit: 67


View Profile
March 09, 2026, 12:15:15 PM
 #1


What are the largest databases of pairs (pk,adr) where pk is a private key (whatever format) and adr the corresponding P2PKH address?  I'm trying to estimate what kind of training data an AI would have.
Thanks.
ABCbits
Legendary
*
Offline Offline

Activity: 3542
Merit: 9837



View Profile
March 10, 2026, 08:16:31 AM
Merited by vapourminer (4), rat03gopoh (1)
 #2

What are the largest databases of pairs (pk,adr) where pk is a private key (whatever format) and adr the corresponding P2PKH address?

I doubt anyone actually bother to create and share such data, when they could just generate it by themselves. If you attempt to search it on google, you probably find directory.io clone that actually generate private key when you open their webpage.

I'm trying to estimate what kind of training data an AI would have.

Maybe this page can help you https://en.bitcoin.it/wiki/Technical_background_of_version_1_Bitcoin_addresses. Note that uncompressed and compressed WIF private key generate different Bitcoin address.

███████████████████████████
███████▄████████████▄██████
████████▄████████▄████████
███▀█████▀▄███▄▀█████▀███
█████▀█▀▄██▀▀▀██▄▀█▀█████
███████▄███████████▄███████
███████████████████████████
███████▀███████████▀███████
████▄██▄▀██▄▄▄██▀▄██▄████
████▄████▄▀███▀▄████▄████
██▄███▀▀█▀██████▀█▀███▄███
██▀█▀████████████████▀█▀███
███████████████████████████
.
.Duelbits PREDICT..
█████████████████████████
█████████████████████████
███████████▀▀░░░░▀▀██████
██████████░░▄████▄░░████
█████████░░████████░░████
█████████░░████████░░████
█████████▄▀██████▀▄████
████████▀▀░░░▀▀▀▀░░▄█████
██████▀░░░░██▄▄▄▄████████
████▀░░░░▄███████████████
█████▄▄█████████████████
█████████████████████████
█████████████████████████
.
.WHERE EVERYTHING IS A MARKET..
█████
██
██







██
██
██████
Will Bitcoin hit $200,000
before January 1st 2027?

    No @1.15         Yes @6.00    
█████
██
██







██
██
██████

  CHECK MORE > 
odolvlobo
Legendary
*
Offline Offline

Activity: 4956
Merit: 3757



View Profile
March 12, 2026, 11:02:43 PM
Merited by vapourminer (1), ABCbits (1)
 #3


What are the largest databases of pairs (pk,adr) where pk is a private key (whatever format) and adr the corresponding P2PKH address?  I'm trying to estimate what kind of training data an AI would have.
Thanks.

I'm curious about the purpose. Is your goal to train an AI to guess the private key from an address? Good luck with that!

If that is your goal, then I suggest starting with a goal that is potentially more achievable. Bitcoin addresses are the result of hashing the public key twice using two different algorithms, so the AI would have to identify exploits in each of three different processes. RIPEMD-160 is the simplest of the three. Why not first test feasibility by training on RIPEMD160 hashes?

Anyway, as mentioned before, you don't need to find a database you can generate billions of addresses yourself.

Join an anti-signature campaign: Click ignore on the members of signature campaigns.
PGP Fingerprint: 6B6BC26599EC24EF7E29A405EAF050539D0B2925 Signing address: 13GAVJo8YaAuenj6keiEykwxWUZ7jMoSLt
NotATether
Legendary
*
Offline Offline

Activity: 2268
Merit: 9575


┻┻ ︵㇏(°□°㇏)


View Profile WWW
March 14, 2026, 10:00:09 AM
 #4

If you have private keys, why do you need to record the address? You are just wasting space.

The probability distribution function of private key bits given address characters is completely random. AI cannot help you here. And besides, even for the private keys that are already publicly known for whatever reason, storage will take petabytes and petabytes.

It is a wild guess but I consider it a very large number.

 
 b1exch.to 
  ETH      DAI   
  BTC      LTC   
  USDT     XMR    
.███████████▄▀▄▀
█████████▄█▄▀
███████████
███████▄█▀
█▀█
▄▄▀░░██▄▄
▄▀██▄▀█████▄
██▄▀░▄██████
███████░█████
█░████░█████████
█░█░█░████░█████
█░█░█░██░█████
▀▀▀▄█▄████▀▀▀
PocketAurora
Newbie
*
Offline Offline

Activity: 6
Merit: 2


View Profile
March 16, 2026, 12:20:34 PM
Merited by vapourminer (1), ABCbits (1)
 #5

You probably won’t find a large public dataset of (private key → address) pairs because that would immediately compromise wallets. Most research datasets only contain addresses and transaction graphs, not private keys.

One of the biggest datasets used in research is this one:
https://arxiv.org/abs/2411.10325

It includes 252 million nodes and 785 million edges representing transactions, covering around 670 million transactions over ~13 years.

So realistically, AI models are trained on transaction graphs and address behavior, not private keys. Smiley
ABCbits
Legendary
*
Offline Offline

Activity: 3542
Merit: 9837



View Profile
Today at 07:15:19 AM
Merited by vapourminer (1)
 #6

You probably won’t find a large public dataset of (private key → address) pairs because that would immediately compromise wallets.

Or rather compromise Bitcoin address that generated with weak RNG source.



Most research datasets only contain addresses and transaction graphs, not private keys.

One of the biggest datasets used in research is this one:
https://arxiv.org/abs/2411.10325

It includes 252 million nodes and 785 million edges representing transactions, covering around 670 million transactions over ~13 years.

So realistically, AI models are trained on transaction graphs and address behavior, not private keys. Smiley

I doubt it's what OP looking for, since he specifically mention private key. But it's somewhat interesting since it use this forum as one of data source.

Limits
Posts retrieved from BitcoinTalk may contain inaccuracies, misinformation, or deliberate falsehoods posted by user.
Bitcoin addresses were also extracted from user profiles on BitcoinTalk. Forum users often include their personal Bitcoin addresses in their profiles or signatures, displayed below their posts. We scraped the profiles of all forum posters from previously collected messages, labeling each identified address as ’individual

It's obvious to some people, but it's another reminder about privacy concern when you publish your data on public forum.

███████████████████████████
███████▄████████████▄██████
████████▄████████▄████████
███▀█████▀▄███▄▀█████▀███
█████▀█▀▄██▀▀▀██▄▀█▀█████
███████▄███████████▄███████
███████████████████████████
███████▀███████████▀███████
████▄██▄▀██▄▄▄██▀▄██▄████
████▄████▄▀███▀▄████▄████
██▄███▀▀█▀██████▀█▀███▄███
██▀█▀████████████████▀█▀███
███████████████████████████
.
.Duelbits PREDICT..
█████████████████████████
█████████████████████████
███████████▀▀░░░░▀▀██████
██████████░░▄████▄░░████
█████████░░████████░░████
█████████░░████████░░████
█████████▄▀██████▀▄████
████████▀▀░░░▀▀▀▀░░▄█████
██████▀░░░░██▄▄▄▄████████
████▀░░░░▄███████████████
█████▄▄█████████████████
█████████████████████████
█████████████████████████
.
.WHERE EVERYTHING IS A MARKET..
█████
██
██







██
██
██████
Will Bitcoin hit $200,000
before January 1st 2027?

    No @1.15         Yes @6.00    
█████
██
██







██
██
██████

  CHECK MORE > 
LanternSapphire
Newbie
*
Offline Offline

Activity: 6
Merit: 0


View Profile
Today at 03:32:37 PM
 #7

If you’re thinking in terms of AI training data, you’re mixing two different things:

Public datasets → addresses, transactions, balances

Private data → private keys (almost never available)

There are large datasets like this one:
https://www.nature.com/articles/s41597-025-04684-8

It contains hundreds of millions of nodes (addresses + transactions), but no private keys.

That’s because private keys are never exposed in normal Bitcoin usage — only signatures are.

So realistically, AI models are trained on transaction graphs, not keypairs.
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!