Bitcoin Forum
July 30, 2025, 10:18:48 AM *
News: Latest Bitcoin Core release: 29.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: Playing with analytics  (Read 252 times)
JuxtaposeLife (OP)
Newbie
*
Offline Offline

Activity: 16
Merit: 44


View Profile
February 13, 2025, 03:19:20 AM
Merited by pooya87 (4), ABCbits (2), Findingnemo (1)
 #1

Spent the last few months exploring the raw data after setting up a node, and constructing a relational database with a goal of tinkering with analytics -- exploring mostly. My database is way too large (almost 4TB - a lot of that indexing) so I need to work on condensing the data to more relevant tables for ongoing discovery.

If this isn't the right place for this, appologies. But I thought I would document some findings, and hopefully engage in some discussion or get some inspiration on what else to look for. I realize I should probably have a time series dataset as well... I'm very curious to model the hodl waves. If any of my data points below are egregiously wrong, knowing that would be good also haha. It's entirely possible I'd have a data integrity issue, despite my best efforts on that front.

I started looking at the utxos (I'm storing them all, and toggling an is_spent to true when they are used). Trying to make sense of what would be considered dust, or likely lost due to time and size. Since Bitcoins start, I see 3.2 billion utxos, of which 179,288,217 haven't been spent (yet)... of that group 86,898,633 of them have a balance of less than 1000 sats (0.00001 BTC or roughly $1 USD currently - at or below network fees). The combined total of all of this unspent 'dust' is 426.3546 BTC. Already this year (2025, or since block 877279) 18.95 BTC has been added to this pile of dust. If I set it to 0.0001 as the threshold, the total is 56.9 so far this year. Worth noting my database was last updated about 20 hours ago, need to create a cron to ingest on every new block as they come. Work in progress...

Is this 0.0001 dust accumulation primarily due to negligence? Or is some other mechanism leading to so much of it? If anyone has ideas of other things to look into I'd love some suggestions Smiley

I think my goal at the end of this is to help others learn about BTC (and with this one, learning how not to end up with dust - by reminding people to consolidate when possible), I have been told I'm fairly good at taking complex and breaking it down to easier to understand language. I'm still very much in learning mode myself... having fun.
ABCbits
Legendary
*
Offline Offline

Activity: 3318
Merit: 8966



View Profile
February 13, 2025, 08:54:26 AM
Merited by pooya87 (4), Findingnemo (1), JuxtaposeLife (1)
 #2

Trying to make sense of what would be considered dust,

Bitcoin Core have 546 satoshi as dust limit, although it was 5460 satoshi long time ago. Although the exact way to calculate dust limit is more complex, see https://github.com/bitcoin/bitcoin/blob/v28.1/src/policy/policy.cpp#L26.

Is this 0.0001 dust accumulation primarily due to negligence? Or is some other mechanism leading to so much of it? If anyone has ideas of other things to look into I'd love some suggestions Smiley

Since you don't mention when those UTXO dust created, i would speculate most of them are created by Ordinals. See https://bitcoin.stackexchange.com/a/118262.

Findingnemo
Hero Member
*****
Offline Offline

Activity: 2772
Merit: 929


Bitcoin = Financial freedom


View Profile
February 13, 2025, 07:07:32 PM
Merited by JuxtaposeLife (3)
 #3

Is this 0.0001 dust accumulation primarily due to negligence? Or is some other mechanism leading to so much of it? If anyone has ideas of other things to look into I'd love some suggestions Smiley

Since you don't mention when those UTXO dust created, i would speculate most of them are created by Ordinals. See https://bitcoin.stackexchange.com/a/118262.

OP can verify this data by analysing the dust accumulation over the time and see if it's spikes exactly when the ordinals started spamming the network.

The other contributors are some wallets which doesn't allow coin control lead to created unnecessary dust amount into the change address.

▄███████████████████▄
████████████████████████

██████████▀▀▀▀██████████
███████████████▀▀███████
█████████▄▄███▄▄█████
████████▀▀████▀███████
█████████▄▄██▀██████████
████████████▄███████████
██████████████▄█████████
██████████▀▀███▀▀███████
███████████████████████
█████████▄▄████▄▄████████
▀███████████████████▀
.
 BC.GAME 
███████████████
███████████████
███████████████
███████████████
██████▀░▀██████
████▀░░░░░▀████
███░░░░░░░░░███
███▄░░▄░▄░░▄███
█████▀░░░▀█████

███████████████

███████████████

███████████████

███████████████
███████████████
███████████████
███████████████
███████████████
███░░▀░░░▀░░███
███░░▄▄▄░░▄████
███▄▄█▀░░▄█████
█████▀░░▐██████
█████░░░░██████

███████████████

███████████████

███████████████

███████████████
███████████████
███████████████
███████████████
███████████████
██████▀▀░▀▄░███
████▀░░▄░▄░▀███
███▀░░▀▄▀▄░▄███
███▄░░▀░▀░▄████
███░▀▄░▄▄██████

███████████████

███████████████

███████████████

███████████████

DEPOSIT BONUS
.1000%.
GET FREE
...5 BTC...

REFER & EARN
..$1000 + 15%..
COMMISSION


 Play Now 
JuxtaposeLife (OP)
Newbie
*
Offline Offline

Activity: 16
Merit: 44


View Profile
February 14, 2025, 04:00:50 AM
 #4

Since you don't mention when those UTXO dust created, i would speculate most of them are created by Ordinals. See https://bitcoin.stackexchange.com/a/118262.

I'll need to do some more digging into what exactly Ordinals are. I'm still cleaning up the data. I discovered that about 140m of my utxos seem to be OP_RETURN and have an amount of 0.00000000 but are marked as unspent (or rather, were never spent after being created). I'm guessing these were just place markers for people to add comments to the chain? I ordered these by script type and 99.9% of them are of type nulldata.

I'm trying to decide if I should extract these into a separate table to maintain the ingerity (in case in the future I want to distinguish them for some other purpose), or simply mark them as false for the is_spent column to get them out of the way when I do queries on spendable utxos. I suppose I could just exclude them by the nulldata association? Thinking out loud here...


OP can verify this data by analysing the dust accumulation over the time and see if it's spikes exactly when the ordinals started spamming the network.

The other contributors are some wallets which doesn't allow coin control lead to created unnecessary dust amount into the change address.

Good idea. Once I can run some more efficient queries, I'll look at clustering them by data/time.

pooya87
Legendary
*
Offline Offline

Activity: 3892
Merit: 11786



View Profile
February 14, 2025, 05:31:18 AM
Merited by ABCbits (3)
 #5

Is this 0.0001 dust accumulation primarily due to negligence? Or is some other mechanism leading to so much of it?
Apart from the recent Ordinals Attack that contributed to a large number of small UTXOs, majority of the old ones are mainly because of Dusting Attack. This attack is performed as an attempt to deanonymize the addresses that are sent these dust amounts. The "hope" is that the receiver would use other UTXOs from other addresses previously not-linked to the dusted address when consolidating the coins, therefore linking them all together.

Quote
If anyone has ideas of other things to look into I'd love some suggestions Smiley
Try analyzing scripts looking for non-standard/unusual output scripts, unusual signatures, and stuff like that. It could be interesting.

I discovered that about 140m of my utxos seem to be OP_RETURN and have an amount of 0.00000000 but are marked as unspent (or rather, were never spent after being created). I'm guessing these were just place markers for people to add comments to the chain? I ordered these by script type and 99.9% of them are of type nulldata.
Technically OP_RETURN outputs are never included in the UTXO set because they are not unspent (ie. not yet spent, may be spent in the future) transaction outputs. They are unspendable as in they cannot be spent ever or they are provably unspendable.

OP_RETURN is just the standard way of adding a small arbitrary data to the bitcoin blockchain, which is why they call it "nulldata".

ABCbits
Legendary
*
Offline Offline

Activity: 3318
Merit: 8966



View Profile
February 14, 2025, 08:16:27 AM
 #6

Forget to mention it earlier, but have you consider using existing statistic/analytic tool such as Statoshi[1-2]? You may get some inspiration of what you could analyze further.

OP_RETURN is just the standard way of adding a small arbitrary data to the bitcoin blockchain, which is why they call it "nulldata".

For additional reference, OP_RETURN used by OmniLayer and Runes protocol to create token. OmniLayer was mainly used for USDT which AFAIK no longer supported by Tether, while Runes is about NFT/token.

[1] https://github.com/jlopp/statoshi
[2] https://statoshi.info

pooya87
Legendary
*
Offline Offline

Activity: 3892
Merit: 11786



View Profile
February 14, 2025, 12:41:11 PM
 #7

OP_RETURN is just the standard way of adding a small arbitrary data to the bitcoin blockchain, which is why they call it "nulldata".

For additional reference, OP_RETURN used by OmniLayer and Runes protocol to create token. OmniLayer was mainly used for USDT which AFAIK no longer supported by Tether, while Runes is about NFT/token.
That's right but it is worth mentioning that from the perspective of Bitcoin protocol and its smart contract language (scripts) what is followed by OP_RETURN is arbitrary data that won't be interpreted in any other way.
Outside Bitcoin protocol, usually in centralized databases (like Tether) they can interpret that arbitrary data however they like with their own protocol such as using it as token transfers inside Omni Layer.

Accardo
Hero Member
*****
Offline Offline

Activity: 1526
Merit: 575


Leading Crypto Sports Betting & Casino Platform


View Profile
February 15, 2025, 07:25:11 AM
 #8

For additional reference, OP_RETURN used by OmniLayer and Runes protocol to create token. OmniLayer was mainly used for USDT which AFAIK no longer supported by Tether, while Runes is about NFT/token.
That's right but it is worth mentioning that from the perspective of Bitcoin protocol and its smart contract language (scripts) what is followed by OP_RETURN is arbitrary data that won't be interpreted in any other way.
Outside Bitcoin protocol, usually in centralized databases (like Tether) they can interpret that arbitrary data however they like with their own protocol such as using it as token transfers inside Omni Layer.

OP_RETURN, in Bitcoin protocol, prune transaction embedded with data out of the Utxo set and make them unspendable, which creates a way for the Opcode to be used for burning bicoin. Runes protocol, for instance, then uses the Opcode to hold just about 80bytes of fungible data token, that includes, name, minting, token ID, and symbol, to aid the transfers of Runestoken.

..Stake.com..   ▄████████████████████████████████████▄
   ██ ▄▄▄▄▄▄▄▄▄▄            ▄▄▄▄▄▄▄▄▄▄ ██  ▄████▄
   ██ ▀▀▀▀▀▀▀▀▀▀ ██████████ ▀▀▀▀▀▀▀▀▀▀ ██  ██████
   ██ ██████████ ██      ██ ██████████ ██   ▀██▀
   ██ ██      ██ ██████  ██ ██      ██ ██    ██
   ██ ██████  ██ █████  ███ ██████  ██ ████▄ ██
   ██ █████  ███ ████  ████ █████  ███ ████████
   ██ ████  ████ ██████████ ████  ████ ████▀
   ██ ██████████ ▄▄▄▄▄▄▄▄▄▄ ██████████ ██
   ██            ▀▀▀▀▀▀▀▀▀▀            ██ 
   ▀█████████▀ ▄████████████▄ ▀█████████▀
  ▄▄▄▄▄▄▄▄▄▄▄▄███  ██  ██  ███▄▄▄▄▄▄▄▄▄▄▄▄
 ██████████████████████████████████████████
▄▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▄
█  ▄▀▄             █▀▀█▀▄▄
█  █▀█             █  ▐  ▐▌
█       ▄██▄       █  ▌  █
█     ▄██████▄     █  ▌ ▐▌
█    ██████████    █ ▐  █
█   ▐██████████▌   █ ▐ ▐▌
█    ▀▀██████▀▀    █ ▌ █
█     ▄▄▄██▄▄▄     █ ▌▐▌
█                  █▐ █
█                  █▐▐▌
█                  █▐█
▀▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▀█
▄▄█████████▄▄
▄██▀▀▀▀█████▀▀▀▀██▄
▄█▀       ▐█▌       ▀█▄
██         ▐█▌         ██
████▄     ▄█████▄     ▄████
████████▄███████████▄████████
███▀    █████████████    ▀███
██       ███████████       ██
▀█▄       █████████       ▄█▀
▀█▄    ▄██▀▀▀▀▀▀▀██▄  ▄▄▄█▀
▀███████         ███████▀
▀█████▄       ▄█████▀
▀▀▀███▄▄▄███▀▀▀
..PLAY NOW..
JuxtaposeLife (OP)
Newbie
*
Offline Offline

Activity: 16
Merit: 44


View Profile
February 15, 2025, 04:31:02 PM
 #9

I did a print out of the combined total of all utxos where is_spent = false and I got back
20009210.72379106 ...

which is close, but higher than the total supply in circulation (around 19.82m)

I'm hoping the difference has something to do with a certain script type not strigger the is_spent variable I track... I'm also aware that the node maintains a current list of all utxos, so maybe my best bet is to do a comparison of utxo ID fields?

Curious if anything stands out to anyone on this: Here is a print out of amounts summed by script_type...

Code:
      script_type      |    cnt    |   total_amount
-----------------------+----------+------------------
 pubkeyhash            | 52505234 | 6812722.99744588
 witness_v0_keyhash    | 49812368 | 5950961.07712848
 scripthash            | 14085469 | 4212919.07096102
 pubkey                |    45304 | 1720781.42621185
 witness_v0_scripthash |  1589794 | 1161322.78037109
 witness_v1_taproot    | 58752893 |  147776.69085131
 nonstandard           |   187875 |    2627.60490072
 multisig              |  1872416 |      57.31497578
 nulldata              |   143801 |      41.75518261
 witness_unknown       |      199 |       0.00576232

Not worth doing more dust accounting until I get this table synced with the current chain...

I'll do a join with the tx_inputs table to check for situations where a spend has happened but possibly didn't get reflected in my utxos table. I'll have to mark down where this happened... hopefully don't be an ongoing issue forward, but I guess could be accounted for programatically as new data comes in... making progress...
pooya87
Legendary
*
Offline Offline

Activity: 3892
Merit: 11786



View Profile
February 16, 2025, 04:51:38 AM
Merited by ABCbits (1)
 #10

OP_RETURN, in Bitcoin protocol, prune transaction embedded with data out of the Utxo set and make them unspendable,
The protocol doesn't make them unspendable by pruning them out of the UTXO set (or rather not including them in first place), they are unspendable because of how that OP code is defined. Not to mention that bitcoin core is only skipping certain scripts that contain OP_RETURN[1] not all of them and that behavior can be categorized as a preference by this client not as part of the protocol.

The way protocol makes them unspendable is by defining how they are treated[2] which is: if the interpreter reaches OP_RETURN the script evaluation fails right away. Otherwise the following output script is also unspendable but bitcoin core will include it in the UTXO set:
Code:
OP_1 OP_RETURN

P.S. FWIW when I say "reaches them" I mean the following output script is spendable regardless of the OP_RETURN inside it:
Code:
OP_1 OP_IF OP_1 OP_ELSE OP_RETURN OP_ENDIF

[1] https://github.com/bitcoin/bitcoin/blob/43e71f74988b2ad87e4bfc0e1b5c921ab86ec176/src/script/script.h#L571
[2] https://github.com/bitcoin/bitcoin/blob/43e71f74988b2ad87e4bfc0e1b5c921ab86ec176/src/script/interpreter.cpp#L666-L669

Runes protocol, for instance, then uses the Opcode to hold just about 80bytes of fungible data token, that includes, name, minting, token ID, and symbol, to aid the transfers of Runestoken.
That is still arbitrary data as far as Bitcoin protocol is concerned.

Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!