Bitcoin Forum
May 13, 2024, 05:15:14 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: How to generate list of all non-zero balances of Bitcoin blockchain?  (Read 416 times)
zorro_tolerant (OP)
Newbie
*
Offline Offline

Activity: 14
Merit: 0


View Profile
January 11, 2018, 05:06:36 PM
 #1

Hi all,

I have tried a couple of clients and libs but none of them allows smoothly generate balances all non-zero balances from the blockchain.
Some do fail on the latest block which is probably related to some format change, and some just throw 'segmentation fault' error and do not provide any additional data.
For instance, blcokparser tool only allows me to get all the transactions till block 488466. Does anyone know solution working out of the box?

Regards
1715577314
Hero Member
*
Offline Offline

Posts: 1715577314

View Profile Personal Message (Offline)

Ignore
1715577314
Reply with quote  #2

1715577314
Report to moderator
Bitcoin addresses contain a checksum, so it is very unlikely that mistyping an address will cause you to lose money.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1715577314
Hero Member
*
Offline Offline

Posts: 1715577314

View Profile Personal Message (Offline)

Ignore
1715577314
Reply with quote  #2

1715577314
Report to moderator
1715577314
Hero Member
*
Offline Offline

Posts: 1715577314

View Profile Personal Message (Offline)

Ignore
1715577314
Reply with quote  #2

1715577314
Report to moderator
Anti-Cen
Member
**
Offline Offline

Activity: 210
Merit: 26

High fees = low BTC price


View Profile
January 11, 2018, 06:44:32 PM
 #2

The block-chain is really a link list of blocks with the hash of the last block being added
to the hash of the current block so any change in any of the data in a previous block will
throw the whole block-chain out so it's like writing things in stone. (Read only in DB terms)

To balance a wallet it becomes very complicated because you have to walk backwards back
down the chain, open each block and look for part coins making up the coins in the wallet and
we are not using a relational database here so it means reading all 200gb of data in the chain.

if i wanted to try to list balances for all none zero wallets then I would want to walk Forwards
but it's not a double linked lists (Walk both ways) but it could be index from a pre-scan to work like this

Now you are talking and would create wallet objects and keep track of balances moving forwards
in time

200gb of data does not scale to run on lots of little machines and database get slow even with 16gb of ram
and being spread over several disk drives long before processing a 200gb database and it is here where the
trouble of Bitcoin needs addressing and not stupid fixes like Lightning but the developers seem to love the
concept of blocks being full, running out of water in the desert.

I guess you want to get the balance to use as a snapshot to speed things up for now and this
is just what the development team would had done if they were not filling there pockets and
laughing at us.

In the land of plenty, the fool is thirsty (Bob Marley 1974)  


Mining is CPU-wars and Intel, AMD like it nearly as much as big oil likes miners wasting electricity. Is this what mankind has come too.
zorro_tolerant (OP)
Newbie
*
Offline Offline

Activity: 14
Merit: 0


View Profile
January 11, 2018, 07:32:03 PM
 #3

Thanks for your reply! Yes, I agree that reading all the transaction and caching all the addresses is heavily memory consuming task, but I've seen in the blockparser implementation that they use Google's sparsehash library for this data structure and it's possible to fit into 64Gb of RAM if to properly release unused transactions. As I've said: it's possible to have the full list of non zero balances, but the problem is that at some point of time the format of the block or protocol has been changed and old parsers are not compatible anymore with them. So I wonder if there are any up-to-date solutions for parsing the whole blockchain in an efficient way?



Anti-Cen
Member
**
Offline Offline

Activity: 210
Merit: 26

High fees = low BTC price


View Profile
January 12, 2018, 12:21:02 AM
 #4

if you run a full node and wait for the whole chain to download then may someone can knock some code up for you that will
RPC the node on your local machine and that should give you what you needs.

Some blocks I am told are real odd to read and understand

if you look around on the internet then someone has imported the whole thing (BC) into a database
with a few relational tables and they allow you to write raw SQL statements against the database
live on-line and they might have done a roll-up table to make your task easy

I am good at SQL and you will need some Group By in the code and to sum totals but unless it's
running on some high end machine then you will need to break it down due to time outs and this
would be above my wage grade unless it has a roll-up or view containing the right SQL-functions

Would be interesting to see just how many wallets are empty or near empty there are and I would share
with you just how few real, up and ruining web-sites there are on the internet but you have been pre-programmed
to expect numbers in the billions and would just laugh anyhow so please PM me if you get the results
because I keep my nose to the ground.






Mining is CPU-wars and Intel, AMD like it nearly as much as big oil likes miners wasting electricity. Is this what mankind has come too.
btctousd81
Sr. Member
****
Offline Offline

Activity: 434
Merit: 270


View Profile WWW
January 12, 2018, 06:34:16 AM
 #5

i recently did this.

try rusty-blockparser https://github.com/gcarq/rusty-blockparser

with unspencsv callback.,
it thorws txt file., it gives all unspent outputs with address , amount , txid, nout.


zorro_tolerant (OP)
Newbie
*
Offline Offline

Activity: 14
Merit: 0


View Profile
January 12, 2018, 06:50:29 AM
 #6

Thanks for the hint! I will give a try and come back ones I have some results to share.
zorro_tolerant (OP)
Newbie
*
Offline Offline

Activity: 14
Merit: 0


View Profile
January 12, 2018, 06:57:28 AM
 #7

I have the statistics of all the addresses ever used in Bitcoin blockchain till the block 488466 where
my parser throws an exception and quits. I can upload this file somewhere and shre the link if you are interested in.
Most of the addresses are disposable and empty, probably used by mixers... it's just my guess.

Anyway, thanks for your help!

Btw, Bitcoin-qt client is using levelDB for quick access to unspent transactions.
bitfools
Member
**
Offline Offline

Activity: 112
Merit: 12


View Profile
January 12, 2018, 12:59:39 PM
Last edit: January 12, 2018, 01:11:25 PM by bitfools
Merited by LFC_Bitcoin (1)
 #8

Hi all,

I have tried a couple of clients and libs but none of them allows smoothly generate balances all non-zero balances from the blockchain.
Some do fail on the latest block which is probably related to some format change, and some just throw 'segmentation fault' error and do not provide any additional data.
For instance, blcokparser tool only allows me to get all the transactions till block 488466. Does anyone know solution working out of the box?

Regards


Very good question, I think the first step for all bitcoin 'hackers' is having the database's

Snort is a good start, its bundled with brainflayer, but as you already known starting from genesis block 1 to N is easy, around 400k most parsers break

Lots of ppl wrote code in 2013, so my first 'advice' is work backwards, anybody can start at zero and bomb at +400,000 all the SHIT on github can do that, so ...

1.) get your full node running with TXINDEX set to 1, this gets you all the transactions decoded into script

2.) run your parser backwards from 530,000 ( where ever now ) and decrement, ... do the hard first, it only get easier

3.) addresses are the easy part, but you really don't want them, because the good stuff is the hashed public keys and r-values that can actually tell you stuff

4.) python is probably best cuz its easy to hack, and if your going to parse the entire btc blockchain from 5xxxxxx to 1, your going to be doing a lot of hacking

5.) don't bother with these databases, they all bomb out on the 200GB of required data, have many databases, I mainly just have .txt and csv, and then use bloom-filters as my datbase front-end that way all my querys have zero latency, filling the bloom with data is one time, and acquiring the data is what takes time, but once you have the data, its zero time making a decision

6.) sadly its probably going to be RPC with python all the way down hill and then work in JSON, the script is a pain in the ass as its not documented except in the C++ bitcoin core, and they break it on every version, thus if your running RPC at least you have a chance of reading all the data down to one

7. ) u say u want addresses, but once you have them you'll find them more than useless, most addresses hold no bitcoins, have never been used,

8.) U may want to look at snort and brainflayer they include the 'pristine' and all account with balance address list up to about 2015, that's a good place to start so you have something to look at now, pristine means blocks mined from day one, but never spent

9.) again addresses in themselves are rather worthless, what I find useful is to keep a running system that keeps a BLOOM-FILTER fed with all addresses that have a balance, once the balance goes to zero then that bloom-filter for that address is set to zero, thus I have an instant way to know for any address 1 to N ( base 10 ), I can tell u if that address has a balance today, this is useful, because in actuality your running on the memory pool looking at addresses and want to know how to update your bloom-filter for addresses

10.) most useful is deriving private-key/pairs & public-key, say in order of 10 million, and then watch your address bloom-filter to see if an address is used, then you know whether to run more software and go down that rabbit hole

I think I attempt to say more than is needed, just parsing is just in PYTHON

import pycoin/bitcoin ( your favorite shit coin lib on github that none work very well, and all were written 5+ years ago )

rpc=openrpc on port ( 127.0.0.1:8545 )

blkn=rpc.getblocknum('latest')

for blk blkn to 1: # yes  we start with the most recent blocks and work back to dinosaur age
  tx = rpc.gettransaction list from block
  for all txid in tx ...
    for all script in tx
       if tx['value'] > 100M Satoshi THEN # U said high value right?
            print address in script ( or write to file txt/csv )

Easy peezy, whats to say? nothing to it, ... the problem is it takes  a long time, I mean 'getting your list is easy'

Sure you can read the raw blk000n.dat files N to 1 and read the raw script and get the data, but you will find a mess of spaghetti code that makes you insane, its not 'python like' to fuss with hexadecimal, and its no fun in C/C++ given the fact that the entire bitcoin legacy code from day one is a KLUDGE, hack, mess and sucks real bad

Well I make it sound sort of easy, addresses and value come in JSON, but when you want R/S & Public Keys, and hashed keys then you must decode the asm/hex script, which means you must read the C++ source cuz its the only place that documents the script 03/02/01/N/R,...

Besides Adresses in themselves are useless, most addresses you gather have no meaning, here I will give you some numbers

I parse the block chain from N to 1  ( 500k) blocks, about 4,000 transactions per block, and about 2-3 addresses per tx, so that 20 million addresses, of those maybe 5-10% are interesting and less than 0.1% have a balance ( remember ppl are told not to use same addresss twice )

**

I have said it before and I will say it again NONE of the CODE on github works, its all shit, and its all not maintained, as the kids like BUTERIN seem to get bored 1-2 years after they write their library and get 'famous', and then move on and never look back, and some of these libs are just out&out read-only non-maintainble, I have spents MONTHS trying to get ABE ( and many other of these so called bitcoin pasers databases ) to work just to realize that its hard-coded to a specific bTc fork, which means that is worthless if you want to parse 1-N

I wish to say there was a fast way to parse, and even ABE sure it works great 1 to 410,000, and then it stops and never works again, cuz the script is so convoluted, and the code is so tightly wound on early BTC data-structures that its all hopeless, SEGWIT and all these new TX scripts seem to have broken all the shit libs

I find that most of the time i have to parse my own script, that's why I have one parser for each task, and don't bother with one fits all, cuz its too much hacking, ( but its easy as shown above the writer you own parser to gather all high-value address on btc is only 10 lines of python )

One parser to get ALL the addresses, another to get all public-key and their hashes, and another to to get all the R&S values for EcDSA hacking, ... and other databases for other good stuff like dates and time for unusual transactions and all these different databases go into different bloom-filters and then in productions all my main code just uses the bloom filters,

I might add the bloom filters should be updated every 10 minutes, but actually earlier, cuz the early bird is always two steps ahead of consensus Smiley

bitfools
Member
**
Offline Offline

Activity: 112
Merit: 12


View Profile
January 12, 2018, 01:24:46 PM
 #9

if you run a full node and wait for the whole chain to download then may someone can knock some code up for you that will
RPC the node on your local machine and that should give you what you needs.

Some blocks I am told are real odd to read and understand


A block is just a list of 'transactions', and the transactions can be strange, as all tx's can have opcode script, and variable length, which means U must really parse, and can't make any short cuts in this business

So you have your code, where you process all transactions, but you always have a 'catch' a fall through for any tx you don't understand, then print that  txid ( hexadecimal ) then go to chain.so ( raw mode ) better than blockchain.info, and inspect that tx, its better this way than using python debugger, usually a one second glance tells you what code mod you need to make in your python tx parser ( again block parsing is tx parsing, as parsing block is not parsing its just a list )

Well if you can't code, and if your not a programmer, you simply can't do this shit, cuz none of the software out there works for the entire block chain, NONE NADA; if you pay somebody money to write a parser and it breaks ( and it will ), then you will need an infinite amount of money to feed your programmer, and then I repeat NONE of the stuff on the internet works, it all is garbage, but sometimes a good place to start can be found

If U just want to gather addresses, then you don't need to parse, just run through the block-tx-list, in JSON mode and poll the 'addresses', but if you want to public keys or ecdsa values, then you must parse. Mult-Sig, SigWit, .. and all the other protocols all have various args and numbers of IN/OUT, and its all a convoluted mess,
bitfools
Member
**
Offline Offline

Activity: 112
Merit: 12


View Profile
January 12, 2018, 01:50:12 PM
 #10

i recently did this.

try rusty-blockparser https://github.com/gcarq/rusty-blockparser

with unspencsv callback.,
it thorws txt file., it gives all unspent outputs with address , amount , txid, nout.



https://github.com/mikispag/rusty-blockparser.git

Ok, lets take this case, he hasn't touched his code in 3 months, and the CAVEAT says that the code is only tested to block 393k, and that was January 2016, so that be two years old data at best,

So what would be the point of any of this?

I know I played with 'rusty' a few hours and quickly figure out it was more than worthless,

It would be so useful if ppl actually invested time in these code submissions and understood that they didn't work,

I think the OP wants ALL addresses with value since the BIRTH of BTC to NOW, today, and there is no block-parser out there that I know of turn key, for free that does this, and almost all of them are 4+ years old and have never been maintained,

I'm just commenting here to save ppl some time, who actually venture off to 'rusty' now at best the author may have had good intentions, but more than likely this is a game of bitcoin donations setup on GITHUB, where ppl do minimal development and hope some idiot tips them well, but sadly, this sw doesn't exist

Probably the best sw ever for parsing is SNORT, but it requires a machine with 32+ GB or RAM, and I don't know if it work past block 400k, as I don't have any machines with that much ram to go past that point, sadly the guy who wrote snort did it all in memory, and now that BTC is 200 GB well it it doesn't work,

....

In summary the database problem of collecting addresses is the EASIEST problem in the game of hacking-bitcoin, and can be written in 10 lines of python, leave it at that and just DO-IT.
zorro_tolerant (OP)
Newbie
*
Offline Offline

Activity: 14
Merit: 0


View Profile
January 12, 2018, 05:51:20 PM
 #11

Hi all,

I have tried a couple of clients and libs but none of them allows smoothly generate balances all non-zero balances from the blockchain.
Some do fail on the latest block which is probably related to some format change, and some just throw 'segmentation fault' error and do not provide any additional data.
For instance, blcokparser tool only allows me to get all the transactions till block 488466. Does anyone know solution working out of the box?

Regards


Very good question, I think the first step for all bitcoin 'hackers' is having the database's

Snort is a good start, its bundled with brainflayer, but as you already known starting from genesis block 1 to N is easy, around 400k most parsers break

Lots of ppl wrote code in 2013, so my first 'advice' is work backwards, anybody can start at zero and bomb at +400,000 all the SHIT on github can do that, so ...

1.) get your full node running with TXINDEX set to 1, this gets you all the transactions decoded into script

2.) run your parser backwards from 530,000 ( where ever now ) and decrement, ... do the hard first, it only get easier

3.) addresses are the easy part, but you really don't want them, because the good stuff is the hashed public keys and r-values that can actually tell you stuff

4.) python is probably best cuz its easy to hack, and if your going to parse the entire btc blockchain from 5xxxxxx to 1, your going to be doing a lot of hacking

5.) don't bother with these databases, they all bomb out on the 200GB of required data, have many databases, I mainly just have .txt and csv, and then use bloom-filters as my datbase front-end that way all my querys have zero latency, filling the bloom with data is one time, and acquiring the data is what takes time, but once you have the data, its zero time making a decision

6.) sadly its probably going to be RPC with python all the way down hill and then work in JSON, the script is a pain in the ass as its not documented except in the C++ bitcoin core, and they break it on every version, thus if your running RPC at least you have a chance of reading all the data down to one

7. ) u say u want addresses, but once you have them you'll find them more than useless, most addresses hold no bitcoins, have never been used,

8.) U may want to look at snort and brainflayer they include the 'pristine' and all account with balance address list up to about 2015, that's a good place to start so you have something to look at now, pristine means blocks mined from day one, but never spent

9.) again addresses in themselves are rather worthless, what I find useful is to keep a running system that keeps a BLOOM-FILTER fed with all addresses that have a balance, once the balance goes to zero then that bloom-filter for that address is set to zero, thus I have an instant way to know for any address 1 to N ( base 10 ), I can tell u if that address has a balance today, this is useful, because in actuality your running on the memory pool looking at addresses and want to know how to update your bloom-filter for addresses

10.) most useful is deriving private-key/pairs & public-key, say in order of 10 million, and then watch your address bloom-filter to see if an address is used, then you know whether to run more software and go down that rabbit hole

I think I attempt to say more than is needed, just parsing is just in PYTHON

import pycoin/bitcoin ( your favorite shit coin lib on github that none work very well, and all were written 5+ years ago )

rpc=openrpc on port ( 127.0.0.1:8545 )

blkn=rpc.getblocknum('latest')

for blk blkn to 1: # yes  we start with the most recent blocks and work back to dinosaur age
  tx = rpc.gettransaction list from block
  for all txid in tx ...
    for all script in tx
       if tx['value'] > 100M Satoshi THEN # U said high value right?
            print address in script ( or write to file txt/csv )

Easy peezy, whats to say? nothing to it, ... the problem is it takes  a long time, I mean 'getting your list is easy'

Sure you can read the raw blk000n.dat files N to 1 and read the raw script and get the data, but you will find a mess of spaghetti code that makes you insane, its not 'python like' to fuss with hexadecimal, and its no fun in C/C++ given the fact that the entire bitcoin legacy code from day one is a KLUDGE, hack, mess and sucks real bad

Well I make it sound sort of easy, addresses and value come in JSON, but when you want R/S & Public Keys, and hashed keys then you must decode the asm/hex script, which means you must read the C++ source cuz its the only place that documents the script 03/02/01/N/R,...

Besides Adresses in themselves are useless, most addresses you gather have no meaning, here I will give you some numbers

I parse the block chain from N to 1  ( 500k) blocks, about 4,000 transactions per block, and about 2-3 addresses per tx, so that 20 million addresses, of those maybe 5-10% are interesting and less than 0.1% have a balance ( remember ppl are told not to use same addresss twice )

**

I have said it before and I will say it again NONE of the CODE on github works, its all shit, and its all not maintained, as the kids like BUTERIN seem to get bored 1-2 years after they write their library and get 'famous', and then move on and never look back, and some of these libs are just out&out read-only non-maintainble, I have spents MONTHS trying to get ABE ( and many other of these so called bitcoin pasers databases ) to work just to realize that its hard-coded to a specific bTc fork, which means that is worthless if you want to parse 1-N

I wish to say there was a fast way to parse, and even ABE sure it works great 1 to 410,000, and then it stops and never works again, cuz the script is so convoluted, and the code is so tightly wound on early BTC data-structures that its all hopeless, SEGWIT and all these new TX scripts seem to have broken all the shit libs

I find that most of the time i have to parse my own script, that's why I have one parser for each task, and don't bother with one fits all, cuz its too much hacking, ( but its easy as shown above the writer you own parser to gather all high-value address on btc is only 10 lines of python )

One parser to get ALL the addresses, another to get all public-key and their hashes, and another to to get all the R&S values for EcDSA hacking, ... and other databases for other good stuff like dates and time for unusual transactions and all these different databases go into different bloom-filters and then in productions all my main code just uses the bloom filters,

I might add the bloom filters should be updated every 10 minutes, but actually earlier, cuz the early bird is always two steps ahead of consensus Smiley



Hats off to you for your comprehensive answer and huge respect! It's very interesting to read as I have the same frustration
with bitcoin blocks and their protocols which have evolved over time without having proper tooling around it, but only some
enthusiasts who wrote it for fun or for tips.

I have seen at least 4 different block version and not sure if they are backward compatible. For instance with block headers only I ran into
the error with very lightweight parser (https://github.com/tenthirtyone/blocktools) in blk00976.dat
########## Block Header ##########
Version:            536870914
Previous Hash    000000000000000000520200e29fade6f8b03e9eeb57eefd8e7004088a055c7f
Merkle Root    f8e0556bd0ce982a118165a22e18de27ea97a7ac239a9535d203beee598b25b6
Time               2017-08-24 02:15:22
Difficulty            402731232
Nonce       4273024596
##### Tx Count: 2280

Next block throws an exception:
blocktools/blocktools.py", line 11, in uint4
    return struct.unpack('I', stream.read(4))[0]
struct.error: unpack requires a string argument of length 4

I don't want to spend much time on digging into this, cause this was just out of my curiosity and I don't want to sit and twist
bits/bites/words which have been changed many times over 9 years.

Your post helped me a lot to sort these things out, thank a lot again!!!
Fantic666
Newbie
*
Offline Offline

Activity: 1
Merit: 0


View Profile
January 30, 2018, 12:34:30 PM
 #12

@bitfools

Dear Sir,

iam trying to read and understand some of your post about indexing and using BTC Data.
My target is simple - i have a list of around 100 million BTC adresses and want to compare them against adresses with balance.
Iam using c# - i got the complete blockchain indexed to sql, parsing block for block - but now the issues start... its slow as hell.
Iam very new to blockchain parsing - so much to learn and read.

May i ask you some question ?
How did you get the adress with balances in a fast way ?
How many entrys do you compare with the bloom-filters?
How do you keep the adress balance up to date ?

Many thanks for your time and efforts.

Best Regards
Tim
jrian
Jr. Member
*
Offline Offline

Activity: 39
Merit: 6


View Profile
January 30, 2018, 01:08:34 PM
 #13

Fantic666, just check how this parser works - https://github.com/znort987/blockparser


But for me it's work until block 481822. Because of SegWit  - anyone know?

start this parser with ./parser allBalances -a 481822

Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!