You probably won’t find a large public dataset of (private key → address) pairs because that would immediately compromise wallets.
Or rather compromise Bitcoin address that generated with weak RNG source.
Most research datasets only contain addresses and transaction graphs, not private keys.
One of the biggest datasets used in research is this one:
https://arxiv.org/abs/2411.10325It includes 252 million nodes and 785 million edges representing transactions, covering around 670 million transactions over ~13 years.
So realistically, AI models are trained on transaction graphs and address behavior, not private keys.

I doubt it's what OP looking for, since he specifically mention private key. But it's somewhat interesting since it use this forum as one of data source.
Limits
Posts retrieved from BitcoinTalk may contain inaccuracies, misinformation, or deliberate falsehoods posted by user.
Bitcoin addresses were also extracted from user profiles on BitcoinTalk. Forum users often include their personal Bitcoin addresses in their profiles or signatures, displayed below their posts. We scraped the profiles of all forum posters from previously collected messages, labeling each identified address as ’individual
It's obvious to some people, but it's another reminder about privacy concern when you publish your data on public forum.