Bitcoin Forum
April 19, 2024, 05:22:08 PM *
News: Latest Bitcoin Core release: 26.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: Querying the blockchain with SQL databases  (Read 164 times)
Accardo (OP)
Hero Member
*****
Offline Offline

Activity: 1064
Merit: 509


Leading Crypto Sports Betting & Casino Platform


View Profile
August 27, 2023, 10:34:09 AM
Merited by NotATether (5), ABCbits (3), vapourminer (2), DaveF (2), pooya87 (2), DdmrDdmr (1)
 #1

Carried out a search on some SQL programs that can query the blockchain, Saw Abe, but it required that I download bitcoin core, which wasn't what I needed. Knew of some RPC commands that people who run full nodes can use to retrieve data on the blockchain. So I checked out bigquery, read it on a PDF, a google product that has some bitcoin datasets, with two tables blocks and transaction, including interval structures stored in columns; TXin & txout of a transaction, stored in the inputs and outputs columns of the transactions respectively.

It all looked complex to understand at first glance, but read further to discover that it's mainly to help analysis for various kinds of analytic problems, and not for exploring individual transactions.

I queried for the transactions with zero fees, which I also read on the introductory aspect of the program, which explains that miners add their transactions to a block without paying fees. The query retrieved the number of transactions with zero fees

Code:
SELECT 
  ROUND((input_value - output_value)/ size, 0) AS fees_per_byte,
  COUNT(*) AS txn_cnt
FROM
  `bigquery-public-data.crypto_bitcoin.transactions`
WHERE TRUE
  AND block_timestamp >= '2023-08-24'
  AND is_coinbase IS FALSE
GROUP BY 1


Did another test run on another query I found on the article written about the bitcoin dataset, the query is to retrieve balances on different addresses.

Code:
WITH double_entry_book AS (
   -- debits
   SELECT
    array_to_string(inputs.addresses, ",") as address
   , inputs.type
   , -inputs.value as value
   FROM `bigquery-public-data.crypto_bitcoin.inputs` as inputs
   UNION ALL
   -- credits
   SELECT
    array_to_string(outputs.addresses, ",") as address
   , outputs.type
   , outputs.value as value
   FROM `bigquery-public-data.crypto_bitcoin.outputs` as outputs
)
SELECT
   address
,   type   
,   sum(value) as balance
FROM double_entry_book
GROUP BY 1,2
ORDER BY balance DESC
LIMIT 1000

This retrieves different addresses with the amount of bitcoin contained in them, but the numbers didn't look realistic at first, like this 5502219754649, then I copied one address: bc1qjasf9z3h7w3jspkhtgatgpyvvzgpa2wwd2lr0eh5tx44reyn2k7sfc27a4 searched on blockchain explorer and found that the digits doesn't have a decimal point that separates the number, though correct. It looked this way on block explorer 55022.19754649 Tried out different queries, which I may not post because I didn't understand the results, didn't stop loading or too long to share.

The bitcoin dataset is 'bigquery-public-data.crypto_bitcoin' other cryptocurrency datasets also exists too, you can read this thread

Google is still building the model at the moment I notice some limitations, you can also use Legacy or Google SQL to execute query on bigquery.

..Stake.com..   ▄████████████████████████████████████▄
   ██ ▄▄▄▄▄▄▄▄▄▄            ▄▄▄▄▄▄▄▄▄▄ ██  ▄████▄
   ██ ▀▀▀▀▀▀▀▀▀▀ ██████████ ▀▀▀▀▀▀▀▀▀▀ ██  ██████
   ██ ██████████ ██      ██ ██████████ ██   ▀██▀
   ██ ██      ██ ██████  ██ ██      ██ ██    ██
   ██ ██████  ██ █████  ███ ██████  ██ ████▄ ██
   ██ █████  ███ ████  ████ █████  ███ ████████
   ██ ████  ████ ██████████ ████  ████ ████▀
   ██ ██████████ ▄▄▄▄▄▄▄▄▄▄ ██████████ ██
   ██            ▀▀▀▀▀▀▀▀▀▀            ██ 
   ▀█████████▀ ▄████████████▄ ▀█████████▀
  ▄▄▄▄▄▄▄▄▄▄▄▄███  ██  ██  ███▄▄▄▄▄▄▄▄▄▄▄▄
 ██████████████████████████████████████████
▄▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▄
█  ▄▀▄             █▀▀█▀▄▄
█  █▀█             █  ▐  ▐▌
█       ▄██▄       █  ▌  █
█     ▄██████▄     █  ▌ ▐▌
█    ██████████    █ ▐  █
█   ▐██████████▌   █ ▐ ▐▌
█    ▀▀██████▀▀    █ ▌ █
█     ▄▄▄██▄▄▄     █ ▌▐▌
█                  █▐ █
█                  █▐▐▌
█                  █▐█
▀▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▀█
▄▄█████████▄▄
▄██▀▀▀▀█████▀▀▀▀██▄
▄█▀       ▐█▌       ▀█▄
██         ▐█▌         ██
████▄     ▄█████▄     ▄████
████████▄███████████▄████████
███▀    █████████████    ▀███
██       ███████████       ██
▀█▄       █████████       ▄█▀
▀█▄    ▄██▀▀▀▀▀▀▀██▄  ▄▄▄█▀
▀███████         ███████▀
▀█████▄       ▄█████▀
▀▀▀███▄▄▄███▀▀▀
..PLAY NOW..
Even if you use Bitcoin through Tor, the way transactions are handled by the network makes anonymity difficult to achieve. Do not expect your transactions to be anonymous unless you really know what you're doing.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1713547328
Hero Member
*
Offline Offline

Posts: 1713547328

View Profile Personal Message (Offline)

Ignore
1713547328
Reply with quote  #2

1713547328
Report to moderator
ABCbits
Legendary
*
Offline Offline

Activity: 2856
Merit: 7385


Crypto Swap Exchange


View Profile
August 27, 2023, 12:13:56 PM
Merited by pooya87 (2)
 #2

That's interesting, i didn't know Google had public dataset including Bitcoin blockchain.

Carried out a search on some SQL programs that can query the blockchain, Saw Abe, but it required that I download bitcoin core, which wasn't what I needed. Knew of some RPC commands that people who run full nodes can use to retrieve data on the blockchain. So I checked out bigquery, read it on a PDF, a google product that has some bitcoin datasets, with two tables blocks and transaction, including interval structures stored in columns; TXin & txout of a transaction, stored in the inputs and outputs columns of the transactions respectively.

There are other platform to do that (such as https://www.dolthub.com/repositories/web3/bitcoin/data/main), but they no longer update their data.

This retrieves different addresses with the amount of bitcoin contained in them, but the numbers didn't look realistic at first, like this 5502219754649, then I copied one address: bc1qjasf9z3h7w3jspkhtgatgpyvvzgpa2wwd2lr0eh5tx44reyn2k7sfc27a4 searched on blockchain explorer and found that the digits doesn't have a decimal point that separates the number, though correct. It looked this way on block explorer 55022.19754649 Tried out different queries, which I may not post because I didn't understand the results, didn't stop loading or too long to share.

FYI, Bitcoin full node software which was initially written by Satoshi (now called Bitcoin Core) use int64_t to store amount of satoshi. So it's realistic/not weird to see other platform also store amount of satoshi under the hood.

█▀▀▀











█▄▄▄
▀▀▀▀▀▀▀▀▀▀▀
e
▄▄▄▄▄▄▄▄▄▄▄
█████████████
████████████▄███
██▐███████▄█████▀
█████████▄████▀
███▐████▄███▀
████▐██████▀
█████▀█████
███████████▄
████████████▄
██▄█████▀█████▄
▄█████████▀█████▀
███████████▀██▀
████▀█████████
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
c.h.
▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
▀▀▀█











▄▄▄█
▄██████▄▄▄
█████████████▄▄
███████████████
███████████████
███████████████
███████████████
███░░█████████
███▌▐█████████
█████████████
███████████▀
██████████▀
████████▀
▀██▀▀
UFO Chaos
Newbie
*
Offline Offline

Activity: 10
Merit: 0


View Profile
August 27, 2023, 12:22:23 PM
 #3

Just yesterday I was wondering if this was possible, OP I would drop you several merit if I had any, thank you.
DaveF
Legendary
*
Offline Offline

Activity: 3458
Merit: 6209


Crypto Swap Exchange


View Profile WWW
August 27, 2023, 01:48:18 PM
Merited by vapourminer (1), ABCbits (1), garlonicon (1)
 #4

Just going to make the comment that if you want to do anything 'real' with the data you really should download core and import the data into the DB yourself.

Weather deliberate or by accident if you're relying on somebody else's data for financial transactions you are just asking for trouble. Did Google want to censor something? Oops you got the wrong answer to your query.
Did some engineer at Google take a shortcut and copy a set of data to save time and forgot to bring in the real data? oops you've got problems.

Just picking on Google since it's who you are talking about but any place that you're relying on their data you are relying on them to be trustworthy. Might not even have to be the organization itself. Could just be a rogue engineer who's trying to scam something.

-Dave

█▀▀▀











█▄▄▄
▀▀▀▀▀▀▀▀▀▀▀
e
▄▄▄▄▄▄▄▄▄▄▄
█████████████
████████████▄███
██▐███████▄█████▀
█████████▄████▀
███▐████▄███▀
████▐██████▀
█████▀█████
███████████▄
████████████▄
██▄█████▀█████▄
▄█████████▀█████▀
███████████▀██▀
████▀█████████
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
c.h.
▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
▀▀▀█











▄▄▄█
▄██████▄▄▄
█████████████▄▄
███████████████
███████████████
███████████████
███████████████
███░░█████████
███▌▐█████████
█████████████
███████████▀
██████████▀
████████▀
▀██▀▀
Accardo (OP)
Hero Member
*****
Offline Offline

Activity: 1064
Merit: 509


Leading Crypto Sports Betting & Casino Platform


View Profile
August 28, 2023, 06:20:58 PM
Merited by vapourminer (1)
 #5

Just going to make the comment that if you want to do anything 'real' with the data you really should download core and import the data into the DB yourself.

Weather deliberate or by accident if you're relying on somebody else's data for financial transactions you are just asking for trouble. Did Google want to censor something? Oops you got the wrong answer to your query.
Did some engineer at Google take a shortcut and copy a set of data to save time and forgot to bring in the real data? oops you've got problems.

Just picking on Google since it's who you are talking about but any place that you're relying on their data you are relying on them to be trustworthy. Might not even have to be the organization itself. Could just be a rogue engineer who's trying to scam something.

-Dave

This is correct, I've seen multiple complaints of missing data on the old dataset "bigquery-public-data.bitcoin_blockchain.blocks", a case like missing transaction output not correct or shown, but the new dataset `bigquery-public-data.crypto_bitcoin.transactions`seem to have all the blockchain data and updated every 10 minutes or so, not sure, most times people who complain for missing data didn't send out the right query. Google developers also provided a form https://issuetracker.google.com/issues/new?component=187149&template=0 where people who have missing results or any underlying difficulty can send their complaints in details and their dispute may get settled. I'm not trusting them, only that they're trying to solve a problem on their own end, also putting a bitcoin dataset on their bigquery is a good move, despite the certainty people or users will time to time face issues unlike those who download bitcoin core and query directly by themselves. I understand your points, things can get fishy, since it's managed by some developers, aside it, we have explorers to crosscheck our results if unsatisfied.

..Stake.com..   ▄████████████████████████████████████▄
   ██ ▄▄▄▄▄▄▄▄▄▄            ▄▄▄▄▄▄▄▄▄▄ ██  ▄████▄
   ██ ▀▀▀▀▀▀▀▀▀▀ ██████████ ▀▀▀▀▀▀▀▀▀▀ ██  ██████
   ██ ██████████ ██      ██ ██████████ ██   ▀██▀
   ██ ██      ██ ██████  ██ ██      ██ ██    ██
   ██ ██████  ██ █████  ███ ██████  ██ ████▄ ██
   ██ █████  ███ ████  ████ █████  ███ ████████
   ██ ████  ████ ██████████ ████  ████ ████▀
   ██ ██████████ ▄▄▄▄▄▄▄▄▄▄ ██████████ ██
   ██            ▀▀▀▀▀▀▀▀▀▀            ██ 
   ▀█████████▀ ▄████████████▄ ▀█████████▀
  ▄▄▄▄▄▄▄▄▄▄▄▄███  ██  ██  ███▄▄▄▄▄▄▄▄▄▄▄▄
 ██████████████████████████████████████████
▄▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▄
█  ▄▀▄             █▀▀█▀▄▄
█  █▀█             █  ▐  ▐▌
█       ▄██▄       █  ▌  █
█     ▄██████▄     █  ▌ ▐▌
█    ██████████    █ ▐  █
█   ▐██████████▌   █ ▐ ▐▌
█    ▀▀██████▀▀    █ ▌ █
█     ▄▄▄██▄▄▄     █ ▌▐▌
█                  █▐ █
█                  █▐▐▌
█                  █▐█
▀▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▀█
▄▄█████████▄▄
▄██▀▀▀▀█████▀▀▀▀██▄
▄█▀       ▐█▌       ▀█▄
██         ▐█▌         ██
████▄     ▄█████▄     ▄████
████████▄███████████▄████████
███▀    █████████████    ▀███
██       ███████████       ██
▀█▄       █████████       ▄█▀
▀█▄    ▄██▀▀▀▀▀▀▀██▄  ▄▄▄█▀
▀███████         ███████▀
▀█████▄       ▄█████▀
▀▀▀███▄▄▄███▀▀▀
..PLAY NOW..
pooya87
Legendary
*
Offline Offline

Activity: 3430
Merit: 10491



View Profile
August 29, 2023, 04:34:55 AM
Merited by vapourminer (1)
 #6

found that the digits doesn't have a decimal point that separates the number
Decimals are only used in the User Interface to show the value to the user in different ways. Otherwise as far as the protocol and consensus rules are concerned there is no decimal values used anywhere. Every number in Bitcoin is an integer (a whole number) of different size, most are 32 bit (like version and locktime), some are variable size (like script lengths), we have a couple of 256 bit integers (used in difficulty) and the only 64 bit integers we have are the "amounts" field in the outputs of each transaction.

.
.BLACKJACK ♠ FUN.
█████████
██████████████
████████████
█████████████████
████████████████▄▄
░█████████████▀░▀▀
██████████████████
░██████████████
████████████████
░██████████████
████████████
███████████████░██
██████████
CRYPTO CASINO &
SPORTS BETTING
▄▄███████▄▄
▄███████████████▄
███████████████████
█████████████████████
███████████████████████
█████████████████████████
█████████████████████████
█████████████████████████
███████████████████████
█████████████████████
███████████████████
▀███████████████▀
█████████
.
NotATether
Legendary
*
Offline Offline

Activity: 1582
Merit: 6670


bitcoincleanup.com / bitmixlist.org


View Profile WWW
August 29, 2023, 06:49:33 AM
Merited by vapourminer (1)
 #7

Just going to make the comment that if you want to do anything 'real' with the data you really should download core and import the data into the DB yourself.

You can't do that because Bitcoin Core is using LevelDB which is not a relational database, but a NoSQL-type. You could still use a script to conver tthe SQL into whatever format is being dumped into the LevelDB but that is way too time-consuming.

This is correct, I've seen multiple complaints of missing data on the old dataset "bigquery-public-data.bitcoin_blockchain.blocks", a case like missing transaction output not correct or shown, but the new dataset `bigquery-public-data.crypto_bitcoin.transactions`seem to have all the blockchain data and updated every 10 minutes or so, not sure, most times people who complain for missing data didn't send out the right query. Google developers also provided a form https://issuetracker.google.com/issues/new?component=187149&template=0 where people who have missing results or any underlying difficulty can send their complaints in details and their dispute may get settled. I'm not trusting them, only that they're trying to solve a problem on their own end, also putting a bitcoin dataset on their bigquery is a good move, despite the certainty people or users will time to time face issues unlike those who download bitcoin core and query directly by themselves. I understand your points, things can get fishy, since it's managed by some developers, aside it, we have explorers to crosscheck our results if unsatisfied.

Datasets are seldom updated, only a few times per year. It is unlikely that the data contained inside will ever be up-to-date and suitable for real-time use.

.
.BLACKJACK ♠ FUN.
█████████
██████████████
████████████
█████████████████
████████████████▄▄
░█████████████▀░▀▀
██████████████████
░██████████████
████████████████
░██████████████
████████████
███████████████░██
██████████
CRYPTO CASINO &
SPORTS BETTING
▄▄███████▄▄
▄███████████████▄
███████████████████
█████████████████████
███████████████████████
█████████████████████████
█████████████████████████
█████████████████████████
███████████████████████
█████████████████████
███████████████████
▀███████████████▀
█████████
.
NotATether
Legendary
*
Offline Offline

Activity: 1582
Merit: 6670


bitcoincleanup.com / bitmixlist.org


View Profile WWW
August 31, 2023, 10:22:40 AM
 #8

What exactly do you mean by time consuming?

I meant writing a block parser to database, by hand. But it seems that someone has already done that as your link to Github shows.

But it has a problem: It depends on Python 2.7, which has been unsupported for a while, and most distros don't even carry Python2 anymore. It should be straightforward to port it to Python 3 if it's not using Python2 dependencies as well.

.
.BLACKJACK ♠ FUN.
█████████
██████████████
████████████
█████████████████
████████████████▄▄
░█████████████▀░▀▀
██████████████████
░██████████████
████████████████
░██████████████
████████████
███████████████░██
██████████
CRYPTO CASINO &
SPORTS BETTING
▄▄███████▄▄
▄███████████████▄
███████████████████
█████████████████████
███████████████████████
█████████████████████████
█████████████████████████
█████████████████████████
███████████████████████
█████████████████████
███████████████████
▀███████████████▀
█████████
.
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!