nathmaroal (OP)
Newbie
Offline
Activity: 6
Merit: 10
|
 |
October 21, 2025, 08:50:44 AM |
|
Hi everyone,
I’m looking for any archived Bitcoin network-level datasets from around 2012, ideally containing information about transaction propagation or node activity. I’ve read the early work by Dan Kaminsky and others on Bitcoin network monitoring, and I understand this is a long shot - but I figured it’s worth asking here. Back in the day, Blockchain.info used to internally log the IP addresses of nodes that relayed transactions. I’m wondering if any researchers, institutions, or enthusiasts from the early Bitcoin days might have collected similar data and are willing to share it - even partially or informally.
If you know of: any datasets from 2012 that include network-level metadata (e.g., relay info, timestamps, node behavior), old monitoring scripts or tools used to observe the Bitcoin P2P network, academic or personal archives from early Bitcoin research … I’d be very grateful for any pointers. Feel free to share this with anyone who was active in Bitcoin security or network research back then - it might just reach the right person.
|
|
|
|
|
ABCbits
Legendary
Offline
Activity: 3556
Merit: 9880
|
 |
October 21, 2025, 08:58:49 AM Merited by vapourminer (4) |
|
I would be surprised if you could get data from back then. The earliest research data about Bitcoin network that i know is https://www.dsn.kastel.kit.edu/bitcoin/. They begin data collection since July 2015, although it's partially anonymized. https://bitnodes.io/ exist since 2013, but i have no idea whether you can get such old data using their API.
|
|
|
|
nathmaroal (OP)
Newbie
Offline
Activity: 6
Merit: 10
|
 |
October 21, 2025, 12:07:30 PM |
|
Yeah I've been around that. Question is really about finding something else than open source data already available... like private backups, academical works and datasets that haven't been released.. Any lead will be welcome. I know that Blockchain.info had such records, are they willing to share it ? Is there a backup of these data somewhere ?
|
|
|
|
|
|
Accardo
|
 |
October 21, 2025, 03:49:40 PM |
|
If you're good with standard SQL querying you can check out Google Bigquery, it has the oldest historical Blockchain data ranging down to the btc Pizza transaction . There are two datasets on Bitcoin to query to find blocks, transactions, timestamps, etc... Bigquery-public-data:Bitcoin-Blockchain and Bigquery-public-data:crypto-bitcoin Google Process 1TB of query per month free, afterwards you're required to pay to access the database. I tried this query I found on Google that traces back to transaction that took place in 2012 SELECT * FROM bigquery-public-data.crypto_bitcoin.transactions WHERE DATE(block_timestamp) BETWEEN '2012-01-01' AND '2012-12-31' LIMIT 1000; But unfortunately it exceeds the free Quota of 1TB, it needed about 2.15TB to process which requires that I pay to access the data.
|
| ..Stake.com.. | | | ▄████████████████████████████████████▄ ██ ▄▄▄▄▄▄▄▄▄▄ ▄▄▄▄▄▄▄▄▄▄ ██ ▄████▄ ██ ▀▀▀▀▀▀▀▀▀▀ ██████████ ▀▀▀▀▀▀▀▀▀▀ ██ ██████ ██ ██████████ ██ ██ ██████████ ██ ▀██▀ ██ ██ ██ ██████ ██ ██ ██ ██ ██ ██ ██████ ██ █████ ███ ██████ ██ ████▄ ██ ██ █████ ███ ████ ████ █████ ███ ████████ ██ ████ ████ ██████████ ████ ████ ████▀ ██ ██████████ ▄▄▄▄▄▄▄▄▄▄ ██████████ ██ ██ ▀▀▀▀▀▀▀▀▀▀ ██ ▀█████████▀ ▄████████████▄ ▀█████████▀ ▄▄▄▄▄▄▄▄▄▄▄▄███ ██ ██ ███▄▄▄▄▄▄▄▄▄▄▄▄ ██████████████████████████████████████████ | | | | | | ▄▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▄ █ ▄▀▄ █▀▀█▀▄▄ █ █▀█ █ ▐ ▐▌ █ ▄██▄ █ ▌ █ █ ▄██████▄ █ ▌ ▐▌ █ ██████████ █ ▐ █ █ ▐██████████▌ █ ▐ ▐▌ █ ▀▀██████▀▀ █ ▌ █ █ ▄▄▄██▄▄▄ █ ▌▐▌ █ █▐ █ █ █▐▐▌ █ █▐█ ▀▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▀█ | | | | | | ▄▄█████████▄▄ ▄██▀▀▀▀█████▀▀▀▀██▄ ▄█▀ ▐█▌ ▀█▄ ██ ▐█▌ ██ ████▄ ▄█████▄ ▄████ ████████▄███████████▄████████ ███▀ █████████████ ▀███ ██ ███████████ ██ ▀█▄ █████████ ▄█▀ ▀█▄ ▄██▀▀▀▀▀▀▀██▄ ▄▄▄█▀ ▀███████ ███████▀ ▀█████▄ ▄█████▀ ▀▀▀███▄▄▄███▀▀▀ | | | ..PLAY NOW.. |
|
|
|
Donneski
Full Member
 
Offline
Activity: 602
Merit: 196
Contact Hhampuz for campaign
|
 |
October 21, 2025, 04:33:52 PM Merited by vapourminer (4) |
|
Seeing this your topic, I had to do a little research to find out about the best possible answers to your question and from my findings, 2012 network-level Bitcoin data is really rare. However, there are a few leads I saw that you can try. University College London (UCL)’s “A Fistful of Bitcoins” project (Meiklejohn et al.) –They gathered some network data around 2012–2013, If you can, reach out to them directly. Click here for the university's official website Early crawlers / Bitnodes – From my findings, similar projects existed before Bitnodes went public so you can click hereArchived repos – Try search for old network monitors like bitcoin-seeder or bitcoin-network-crawler using either of Internet Archive by clicking here or GitHub Archive by clicking here BitcoinTalk archives – Use the search bar of the Bitcointalk here to check the technical board for 2011–2013 discussions on “transaction propagation” or “relay IP” The truth is that finding raw datasets will be be tough for you but some old community technical members could still have fragments of those logs. This thread here titled Bitcoin block data (1013 GB): inputs, outputs and transactions by LoyceV could also be of a great help to you. I wish you good luck
|
|
|
|
LoyceV
Legendary
Offline
Activity: 3990
Merit: 21515
Thick-Skinned Gang Leader and Golden Feather 2021
|
This thread here titled Bitcoin block data (1013 GB): inputs, outputs and transactions by LoyceV could also be of a great help to you. This isn't what OP is looking for. Immutable data will always be easy to find on-chain.
|
¡uʍop ǝpᴉsdn pɐǝɥ ɹnoʎ ɥʇᴉʍ ʎuunɟ ʞool no⅄
|
|
|
ABCbits
Legendary
Offline
Activity: 3556
Merit: 9880
|
 |
October 22, 2025, 07:53:50 AM Merited by vapourminer (1) |
|
University College London (UCL)’s “A Fistful of Bitcoins” project (Meiklejohn et al.) –They gathered some network data around 2012–2013, If you can, reach out to them directly. Click here for the university's official website I found what you're talking about on https://discovery.ucl.ac.uk/id/eprint/1490261/. But looking at the PDF section "3. DATA COLLECTION", it seems they collect address/transaction data rather than node data. CMIIW.
|
|
|
|
|
stwenhao
|
 |
October 24, 2025, 09:55:25 AM |
|
Immutable data will always be easy to find on-chain. I doubt, that in 2125, it will be as easy to get transactions from 2009, as it is today. But I can be wrong of course, the future will tell. Because if downloading every transaction in plaintext will still be needed after decades, then it would mean, that we have some good reasons, to worry about thousands of blocks being reorged. However, historically you needed something like six confirmations, today you need one, maybe two or three, and sometimes even zero, or a fractional confirmation (like a proof, that your transaction is included in a miner share with 79 leading zero bits, where the network requires 80 leading zero bits; which is something around 0.5 confirmation). In case of networks like LN, zero confirmations are accepted. So, I really doubt, that big reorgs of 1,000 or more blocks are expected. And for that reason, things can be optimized. And then, as a result, it can become harder to access historical data in plaintext, especially if other ways, like ZK-proofs, would be deployed in production (after so many decades, we will probably have them, because there are many reasons to do that).
|
|
|
|
BlackHatCoiner
Legendary
Offline
Activity: 1974
Merit: 9621
Bitcoin is ontological repair
|
 |
October 24, 2025, 02:40:20 PM |
|
Because if downloading every transaction in plaintext will still be needed after decades, then it would mean, that we have some good reasons, to worry about thousands of blocks being reorged. It's more about simplicity than about security. Sure, no one will reorg everything up to 2009, but why changing what's already set in stone? It's pretty simple to understand that you download every piece of data since 2009, verify the proof-of-work, and reach chain tip. Treating everything before 20XX as "valid" adds trust into the system with no added benefit.
|
|
|
|
LoyceV
Legendary
Offline
Activity: 3990
Merit: 21515
Thick-Skinned Gang Leader and Golden Feather 2021
|
Immutable data will always be easy to find on-chain. I doubt, that in 2125, it will be as easy to get transactions from 2009, as it is today. But I can be wrong of course, the future will tell. I doubt anyone in the year 2125 is going to mind downloading an additional 0.0007 PB to add the first 16 years of Bitcoin blocks to their system. Or should I say 0.0000000007 ZB by then?
|
¡uʍop ǝpᴉsdn pɐǝɥ ɹnoʎ ɥʇᴉʍ ʎuunɟ ʞool no⅄
|
|
|
|
stwenhao
|
 |
October 24, 2025, 04:26:16 PM Merited by vapourminer (1) |
|
I doubt anyone in the year 2125 is going to mind downloading an additional 0.0007 PB Downloading may work, but what about verification? There are CPU-mined altcoins, where the total size of the chain is something around 1 or 2 GBs. But it takes longer to verify them, than verify the Bitcoin chain, with around 700 GB. Why? Because the CPU speed is not growing that fast. We have now 4 GHz, maybe 8 GHz in an edge case, but still, getting 9 GHz is something around the world record: https://www.tomshardware.com/news/core-i9-14900kf-breaks-world-record-almost-achieves-91ghzSo, in that case, where we would have 1 THz processors? Or even 16 GHz, instead of existing 8 GHz? And would we need liquid nitrogen, to provide proper cooling? but why changing what's already set in stone? Because if we have some people, that won't run any node, or run just some SPV client, then it is better to provide a middle-ground, where they will have something more advanced, than a typical SPV node, which would provide some benefits to the rest of the network. You won't convince everyone to run a full, archival node. And you won't convince everyone to download 700 GB, even in pruning mode. However, if there would exist some full nodes, that would have lower requirements, than pruned nodes, where IBD could be done without downloading everything, then they could be better than SPV nodes, and move many users into software, where they don't have to rely on other full nodes that much, because they can also verify something more, than they do today. Treating everything before 20XX as "valid" adds trust into the system with no added benefit. The benefit is to have something more trustless, than just some SPV node, but with less requirements, than a pruned node. Because the alternative is quite simple: less people will run full nodes, more people will stick with SPV nodes, and as more and more standardness limits will be lifted, then more and more SPV nodes can be tricked into accepting something invalid.
|
|
|
|
nathmaroal (OP)
Newbie
Offline
Activity: 6
Merit: 10
|
 |
October 29, 2025, 05:00:26 PM |
|
If you're good with standard SQL querying you can check out Google Bigquery, it has the oldest historical Blockchain data ranging down to the btc Pizza transaction .
Thanks everyone for your suggestions — just to clarify, what I’m looking for isn’t just a basic node metadata. What I’m really looking for is whether anyone was actively observing the blockchain in real time back in 2012, and kept records of what they saw — such as transaction propagation, timestamps, or node relays. The blockchain data itself is fully available through various sources already mentioned in this post, but that’s not what I’m after. I’m specifically interested in external observations or logs that were made at the time, not just the raw blockchain. This thread here titled Bitcoin block data (1013 GB): inputs, outputs and transactions by LoyceV could also be of a great help to you. This isn't what OP is looking for. Immutable data will always be easy to find on-chain. Exactly Seeing this your topic, I had to do a little research to find out about the best possible answers to your question and from my findings, 2012 network-level Bitcoin data is really rare. However, there are a few leads I saw that you can try. University College London (UCL)’s “A Fistful of Bitcoins” project (Meiklejohn et al.) –They gathered some network data around 2012–2013, If you can, reach out to them directly. Click here for the university's official website Early crawlers / Bitnodes – From my findings, similar projects existed before Bitnodes went public so you can click hereArchived repos – Try search for old network monitors like bitcoin-seeder or bitcoin-network-crawler using either of Internet Archive by clicking here or GitHub Archive by clicking here BitcoinTalk archives – Use the search bar of the Bitcointalk here to check the technical board for 2011–2013 discussions on “transaction propagation” or “relay IP” The truth is that finding raw datasets will be be tough for you but some old community technical members could still have fragments of those logs. This thread here titled Bitcoin block data (1013 GB): inputs, outputs and transactions by LoyceV could also be of a great help to you. I wish you good luck Thanks for this ! Will defenetly take at look at these options
|
|
|
|
|
Quickseller
Copper Member
Legendary
Offline
Activity: 3164
Merit: 2397
|
 |
October 29, 2025, 06:20:50 PM |
|
Back in the day, Blockchain.info used to internally log the IP addresses of nodes that relayed transactions. I’m wondering if any researchers, institutions, or enthusiasts from the early Bitcoin days might have collected similar data and are willing to share it - even partially or informally.
Blockchain.info was not connected to every node, so in most cases, the IP address was not the IP address of the node that sent the transaction. Full nodes have never connected to other nodes based on which nodes are geographically closest, or that have the lowest ping time, so an IP address from blockchain.info really wont even give you a general geographic area the transaction was broadcast from.
Out of curiosity, what exactly are you intending to analyse?
|
|
|
|
nathmaroal (OP)
Newbie
Offline
Activity: 6
Merit: 10
|
Out of curiosity, what exactly are you intending to analyse?
I'm conducting research on legacy centralized mixing services, and I'm attempting to deanonymize transactions by identifying patterns in their IP data. Essentially, I'm trying to map out a pool of addresses that were likely used by these mixers, which will then allow me to perform an IN/OUT flow analysis.
|
|
|
|
|
Quickseller
Copper Member
Legendary
Offline
Activity: 3164
Merit: 2397
|
 |
October 30, 2025, 05:25:50 PM |
|
Out of curiosity, what exactly are you intending to analyse?
I'm conducting research on legacy centralized mixing services, and I'm attempting to deanonymize transactions by identifying patterns in their IP data. Essentially, I'm trying to map out a pool of addresses that were likely used by these mixers, which will then allow me to perform an IN/OUT flow analysis. Interesting research topic. Nodes come online and go offline all the time, so the same node may send two transactions, and they may propagate differently, and those transactions may show up in your dataset as coming from two different IPs. I would also assume most mixers would use multiple nodes, but maybe not in 2012. Either way...good luck on your project. I am curious if you're able to deanonymize mixer txns
|
|
|
|
notocactus
Legendary
Offline
Activity: 2954
Merit: 4924
Glory to Ukraine!
|
 |
November 01, 2025, 06:21:53 PM |
|
I'm conducting research on legacy centralized mixing services, and I'm attempting to deanonymize transactions by identifying patterns in their IP data. Essentially, I'm trying to map out a pool of addresses that were likely used by these mixers, which will then allow me to perform an IN/OUT flow analysis.
Mixer creators and operators know about risk with their services so it's less likely that they don't use Tor or VPN. It's less possible that mixers use real IP addresses for their nodes as they have enough technical knowledge on security, privacy and threats from governments. Even now in 2025, data shows many Bitcoin nodes use Tor at 64%. https://bitref.com/nodes/
|
|
|
|
Quickseller
Copper Member
Legendary
Offline
Activity: 3164
Merit: 2397
|
 |
November 01, 2025, 06:35:11 PM |
|
I'm conducting research on legacy centralized mixing services, and I'm attempting to deanonymize transactions by identifying patterns in their IP data. Essentially, I'm trying to map out a pool of addresses that were likely used by these mixers, which will then allow me to perform an IN/OUT flow analysis.
Mixer creators and operators know about risk with their services so it's less likely that they don't use Tor or VPN. It's less possible that mixers use real IP addresses for their nodes as they have enough technical knowledge on security, privacy and threats from governments. Even now in 2025, data shows many Bitcoin nodes use Tor at 64%. https://bitref.com/nodes/I know that you are just posting to increase your post count for your sig deal...but the OP is looking for a dataset from 2012. There were many operators of bitcoin-related businesses back then that, at best, were amateurs handling security. It was also less clear to most people how bitcoin transactions could be tracked.
|
|
|
|
ABCbits
Legendary
Offline
Activity: 3556
Merit: 9880
|
 |
November 02, 2025, 08:40:43 AM |
|
I'm conducting research on legacy centralized mixing services, and I'm attempting to deanonymize transactions by identifying patterns in their IP data. Essentially, I'm trying to map out a pool of addresses that were likely used by these mixers, which will then allow me to perform an IN/OUT flow analysis.
Mixer creators and operators know about risk with their services so it's less likely that they don't use Tor or VPN. It's less possible that mixers use real IP addresses for their nodes as they have enough technical knowledge on security, privacy and threats from governments. Even now in 2025, data shows many Bitcoin nodes use Tor at 64%. https://bitref.com/nodes/FWIW, HTTPS wasn't popular before Edward Snowden disclose US global surveillance program. So using Tor without encryption means malicious exit node can simply collect all of your data.
|
|
|
|
|