Bitcoin Forum
November 09, 2024, 09:27:12 PM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 [2] 3 4 »  All
  Print  
Author Topic: A bitcoin blockchain parser in a few (thousand) lines of C++  (Read 16954 times)
piotr_n
Legendary
*
Offline Offline

Activity: 2055
Merit: 1359


aka tonikt


View Profile WWW
July 05, 2013, 07:55:40 PM
 #21

In the transaction header there are (n1) input scripts and (n2) output scripts; how do I know which input script to run on the virtual-machine prior to running a particular output script?
You take output script from a different transaction - the one pointed by txid+vout

Check out gocoin - my original project of full bitcoin node & cold wallet written in Go.
PGP fingerprint: AB9E A551 E262 A87A 13BB  9059 1BE7 B545 CDF3 FD0E
etotheipi
Legendary
*
expert
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
July 05, 2013, 08:01:36 PM
 #22

>>You run the script for the input and then run the script for the output.

Ok, I had previously asked in this thread how the input and output scripts related to each other and I did not get a clear answer.


In the transaction header there are (n1) input scripts and (n2) output scripts; how do I know which input script to run on the virtual-machine prior to running a particular output script?

Thanks,

John

In a given transaction they are unrelated.   For each TxIn you look at its OutPoint and go fetch the TxOut it references from a previous transaction in the block database.  You use that TxOut.  The TxOut in this Tx will be used as inputs in a future transaction (when the recipient spends the money)

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
TierNolan
Legendary
*
Offline Offline

Activity: 1232
Merit: 1104


View Profile
July 05, 2013, 08:03:56 PM
 #23

In the transaction header there are (n1) input scripts and (n2) output scripts; how do I know which input script to run on the virtual-machine prior to running a particular output script?

No, you run the input script from the spending transaction and then the output script from the source transaction.

If transaction 123456789 input 4 references transaction 987654321, output 7, then those are the 2 scripts you have to use together.

Also, if they are standard transactions, you don't even need to bother, the address will be in a constant location.

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF
piotr_n
Legendary
*
Offline Offline

Activity: 2055
Merit: 1359


aka tonikt


View Profile WWW
July 05, 2013, 08:08:20 PM
 #24

so here is where you come to the point where you understand that you need to build and keep a database of unspent txouts - a big one.
and also you will need sha256, each time before putting a record into it Smiley

EDIT: let my know how many lines of code will your parser have, at the end Smiley

Check out gocoin - my original project of full bitcoin node & cold wallet written in Go.
PGP fingerprint: AB9E A551 E262 A87A 13BB  9059 1BE7 B545 CDF3 FD0E
jratcliff63367 (OP)
Member
**
Offline Offline

Activity: 82
Merit: 10


View Profile
July 05, 2013, 08:43:16 PM
Last edit: July 05, 2013, 08:55:34 PM by jratcliff63367
 #25

Yeah, I'm starting to no longer have fun doing this.  The relationship between the inputs and outputs are about as clear as mud.  Well, let me be clear, I get that each output needs an input, but hell if I know how to figure out which outputs refer to which inputs.  Just a bit ago, I wanted to figure out how to convert the binary representation of a public-key to ASCII and even that it had a log of dependencies; even the 'no library dependency version' depended on some network thing called 'htonl' and on an encryption routine as well.  That's a lot of code drug in just to convert some binary to ASCII.

I still do not believe that this needs to be that difficult, but everyone seems to start by including massive libraries like CrypoPP and boost, networking code, etc. so it's not much fun trying to dig through this stuff.

I like to write code with as few dependencies as humanly possible and often in what I call 'snippet' form; one header file and one CPP which demonstrates how to do just a single thing.  But I can't find anything like that, instead it's these massive libraries which build on top of each other in layers so that breaking it out into 'snippet' form is a major task.

I also think that the code snippet I started with; which reads each block into memory, is valuable and useful, but converting that raw date into paired input/output transactions seems to be a bit of a mystery to me so far.

[*Edit*]

I may not give up just yet.  I have some time this weekend and I finally was able to get BitcoinArmory 'cppForSwig' to build and run on my machine parse the bitcoin blockchain on my hard-drive.  I had to change a few things to get it to work; but it is working now. Hopefully I will be able to unravel the mystery of inputs-outputs and transactions by stepping through this code.

It does use CryptoPP but perhaps I can strip out just the (hopefully) handful of routines which are actually used.

John
piotr_n
Legendary
*
Offline Offline

Activity: 2055
Merit: 1359


aka tonikt


View Profile WWW
July 05, 2013, 08:48:44 PM
Last edit: July 05, 2013, 09:04:21 PM by piotr_n
 #26

Oh, don't be a pussy.
This is exactly where the fun starts.

It all can be done - maybe easier than you imagine ATM.
As someone had said here, you can just keep the unspent outputs in memory, in maps. Hashmaps are standard in C++.
And sha256 function is a simple code that you can include in your sources.
You don't really need any dependencies, if you like writing code Smiley

And no, output does not need an input - only input needs an output.
So you only need to keep track of unspent outputs, in order to be able to 'spend' them by processing further inputs in the block chain.
It's not a really complex problem to solve.

Check out gocoin - my original project of full bitcoin node & cold wallet written in Go.
PGP fingerprint: AB9E A551 E262 A87A 13BB  9059 1BE7 B545 CDF3 FD0E
DeathAndTaxes
Donator
Legendary
*
Offline Offline

Activity: 1218
Merit: 1079


Gerald Davis


View Profile
July 05, 2013, 09:05:34 PM
 #27

Yeah, I'm starting to no longer have fun doing this.  The relationship between the inputs and outputs are about as clear as mud.  Well, let me be clear, I get that each output needs an input, but hell if I know how to figure out which outputs refer to which inputs.  Just a bit ago, I wanted to figure out how to convert the binary representation of a public-key to ASCII and even that it had a log of dependencies; even the 'no library dependency version' depended on some network thing called 'htonl' and on an encryption routine as well.  That's a lot of code drug in just to convert some binary to ASCII.

They don't.  That might be why you are confused.

The only relationship between inputs and outputs in a tx is is that the sum of the inputs must be equal to or greater than the sum of the outputs.  The tx fee is the difference between the sum of the inputs and the sum of the outputs.

So some rules to get you started.

1) The inputs of all txs (except coinbase = generation) are the output of prior transactions.
2) There is a 1:1 relationship between the INPUT of a given transaction and the output of a PRIOR transaction.
3) In a given transaction there is no relationship between the inputs and outputs.
4) For a tx to be valid the sum of the inputs must be equal to or greater than the sum of the outputs.
5) The difference between inputs and outputs is the tx fee.
6) Since a tx can have n inputs and m outputs we refer to a specific output by both the tx hash AND the index.



Quote
but converting that raw date into paired input/output transactions seems to be a bit of a mystery to me so far.

It will forever remain a mystery no matter how many libraries you use until you shake that flawed model.


Lets look at a random transaction.

http://blockexplorer.com/tx/a6d9c176ecb041c2184327b8375981127f3632758a7a8e61b041343efc3bcb6e

In raw form
Quote
{
  "hash":"a6d9c176ecb041c2184327b8375981127f3632758a7a8e61b041343efc3bcb6e",
  "ver":1,
  "vin_sz":1,
  "vout_sz":2,
  "lock_time":0,
  "size":257,
  "in":[
    {
      "prev_out":{
        "hash":"b5045e7daad205d1a204b544414af74fe66b67052838851514146eae5423e325",
        "n":0
      },
      "scriptSig":"304402200e3d4711092794574e9b2be11728cc7e44a63525613f75ebc71375f0a6dd080d02202ef 1123328b3ecddddb0bed77960adccac5bbe317dfb0ce149eeee76498c19b101 04a36b5d3b4caa05aec80752f2e2805e4401fbdbe21be1011dc60c358c5fc4d3bedd1e03161fb4b 3a021c3764da57fee0d73570f3570f1b3dd92a1b06aae968846"
    }
  ],
  "out":[
    {
      "value":"300.00000000",
      "scriptPubKey":"OP_DUP OP_HASH160 0331e5256416bc11ecf9088091f8424819553a10 OP_EQUALVERIFY OP_CHECKSIG"
    },
    {
      "value":"699.99950000",
      "scriptPubKey":"OP_DUP OP_HASH160 4186719d739ae983d8c75a0cb82958e94b7ae81e OP_EQUALVERIFY OP_CHECKSIG"
    }
  ]
}


Now the tx hash is a6d9c176ecb041c2184327b8375981127f3632758a7a8e61b041343efc3bcb6e
The # of inputs is 1 (vin_sz).
The # of outputs is 2 (vout_sz).

The single input of the transaction is referenced here:  
Quote
"in":[
    {
      "prev_out":{
        "hash":"b5045e7daad205d1a204b544414af74fe66b67052838851514146eae5423e325",
        "n":0
      },
      "scriptSig":"304402200e3d4711092794574e9b2be11728cc7e44a63525613f75ebc71375f0a6dd080d02202ef 1123328b3ecddddb0bed77960adccac5bbe317dfb0ce149eeee76498c19b101 04a36b5d3b4caa05aec80752f2e2805e4401fbdbe21be1011dc60c358c5fc4d3bedd1e03161fb4b 3a021c3764da57fee0d73570f3570f1b3dd92a1b06aae968846"
    }

The INPUT for this transaction is the OUTPUT of a PREVIOUS transaction.
Specifically it is the tx with hash b5045e7daad205d1a204b544414af74fe66b67052838851514146eae5423e325.  However a tx can have multiple outputs so how do we know which one of those otuptuts this input refers to?  Simple the tx output index is provided.

Inputs of transactions are outputs of prior transactions.

In this case the input we are looking for is is specifically the output #0 of transaction hash b5045e7daad205d1a204b544414af74fe66b67052838851514146eae5423e325

So lets look at that transaction.  The output #0 of this transaction is 1,000 BTC.
http://blockexplorer.com/tx/b5045e7daad205d1a204b544414af74fe66b67052838851514146eae5423e325

So back to our current transaction (hash ending e325).  There are only 1 input.  The value of the transaction is the sum of the inputs.  In this case a single input of 1,000 BTC.

Now looking at the output section we can see we the two outputs
 "out":[
    {
      "value":"300.00000000",
      "scriptPubKey":"OP_DUP OP_HASH160 0331e5256416bc11ecf9088091f8424819553a10 OP_EQUALVERIFY OP_CHECKSIG"
    },
    {
      "value":"699.99950000",
      "scriptPubKey":"OP_DUP OP_HASH160 4186719d739ae983d8c75a0cb82958e94b7ae81e OP_EQUALVERIFY OP_CHECKSIG"
    }

Output#0 is 300 BTC and output #1 is 699.9995 BTC.
The sum of the outputs is 999.9995 BTC
The sum of the inputs is 1,000.0000 BTC
The tx fee to miners is the difference or 0.0005 BTC.

If you want to decode further which addresses a transaction is sent to well you will need more code.

Lets look at the second output.
Quote
OP_DUP OP_HASH160 4186719d739ae983d8c75a0cb82958e94b7ae81e OP_EQUALVERIFY OP_CHECKSIG

So what is 4186719d739ae983d8c75a0cb82958e94b7ae81e?  It is the RIPEMD160 hash of the public key.  However Bitcoin addresses include a checksum and are converted to Base58.

BTW the public key hash 4186719d739ae983d8c75a0cb82958e94b7ae81e is Bitcoin address 16yTynjmSe5bsRGykDaaCL5bm2pxiEfcqP.

2112
Legendary
*
Offline Offline

Activity: 2128
Merit: 1073



View Profile
July 05, 2013, 10:28:06 PM
 #28

Does anyone know of a C/C++ program someone else has already written that does what I'm trying to do here; just walk the block-chain and track transactions?  All of these code bases have sooo...many dependencies that you dive into crytographic hell trying to decipher them.
Yes, user znort987 did just that. But then he couldn't stand the pressure, deleted the key posts and quit participating in the forum.

His thread starts here:

https://bitcointalk.org/index.php?topic=88584.0

And the source code is still available:

https://github.com/znort987/blockparser

You may want to check his thread and his post to understand a bit of what had happened and avoid any possibility of future stress or nervous breakdown.

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0
jratcliff63367 (OP)
Member
**
Offline Offline

Activity: 82
Merit: 10


View Profile
July 05, 2013, 10:48:39 PM
 #29

Thanks, I looked at that before.  At first it seems close, but it has dependencies on openssl, some google hash map template code, and only builds and runs on unix.  I'm working on windows personally.

I suspect I'm really close to having this all figured out.  It's not that I don't get the basic concepts of the transactions etc. I'm just unclear how some of these hashes are computed and how they relate to one another.

John
TierNolan
Legendary
*
Offline Offline

Activity: 1232
Merit: 1104


View Profile
July 05, 2013, 10:56:03 PM
 #30

Thanks, I looked at that before.  At first it seems close, but it has dependencies on openssl, some google hash map template code, and only builds and runs on unix.  I'm working on windows personally.

You need to be able to do sha256, if you want to work out hashes.  You could probably find code for just that.

OpenSSL is also used for all the other crypt stuff, but you don't need that.

You also need the base58 encoder to convert public keys (after hashing with RIPE-160) into bitcoin text addresses.

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF
jratcliff63367 (OP)
Member
**
Offline Offline

Activity: 82
Merit: 10


View Profile
July 06, 2013, 07:17:46 PM
 #31

Ok, I believe I am unblocked now on finishing up my parser/transaction tool.  The piece that I was missing was the transaction hash.

Each input refers to a 'transactionHash' however there is no transaction hash stored in the blockchain .dat files.

After some google searching I discovered that the transactionHash is computed by running SHA256 on the raw transaction input buffer.  Through the magic of SHA256 no two transactions are presumed to ever produce identical hashes.  I don't need to know what 'block' the transaction is in, as the transaction data is a complete standalone dataload unto itself, unique in the world of bitcoin.

This was the missing piece of the puzzle for me because I didn't realize that I had to compute this hash when I parsed the input file, at least, if I ever had any hope of binding inputs to the correct transactions.

A couple of general questions.  Are transactions always written to the last block in the block chain?  In other words, old blocks never get re-written?  I would assume this is the case otherwise the blockchain would become fragmented and need garbage collection, etc.

Thanks,

John
DeathAndTaxes
Donator
Legendary
*
Offline Offline

Activity: 1218
Merit: 1079


Gerald Davis


View Profile
July 06, 2013, 07:30:50 PM
 #32

Older blocks are never rewritten but it is possible there are 2 or more competing chains.

Example the last block is block height 123

One miner solves a block with height height of 124 (it references the block hash 123 as the prior blockhash).  We will call this block 124a.
At roughly the same time a different miner solves a different block with a block height 124.  We will call this block 124b.

At this point the network is not in consensus. Some nodes believe block 124a is the next "valid" block and some believe 124b is.

Eventually some node will solve block 125.  If the block is built on the 124a chain then block 124b becomes orphaned. If it is built on the 124b chain then block 124a is orphaned.

If your node had received block 124a before it became orphaned it will be contained in your block history.  When block 124b orphans block 124a it will also be contained in your block history.  You can detect this because both blocks will have the block hash for block 123 as the prior block hash.

In your block parser you can built a simplified system to remove orphans by working backwards.  bitcoind will tell you the hash of the last block of the longest chain.  Working from that block you can locate the prior block hash field in the header and then locate the corresponding block, and work all the way back to the genesis blocks.  Any other blocks are orphans and for the purpose of parsing the historical blockchain can be pruned.


Simple version:
It is possible at any point in time there are multiple conflicting blockchains.  The "longest"* chain is considered the consensus record.
* longest in this instance doesn't mean just highest block height (as that can be spoofed) it means the chain with highest aggregate difficulty.


ProfMac
Legendary
*
Offline Offline

Activity: 1246
Merit: 1002



View Profile
July 06, 2013, 08:59:40 PM
 #33


Simple version:
It is possible at any point in time there are multiple conflicting blockchains.  The "longest"* chain is considered the consensus record.
* longest in this instance doesn't mean just highest block height (as that can be spoofed) it means the chain with highest aggregate difficulty.


I want to produce just such a spool in the bytecoin blockchain.  Someone mined blocks at the rate of 1 every minute or two at difficulty 8,000, and then quit mining when the difficulty climbed to 32,000.  Since that transition to higher difficulty on about May 5 (two months) only about 250 blocks have been mined.  This past week, 1 single block was mined.

I thought it might be interesting to do some experiments on that block chain, to go back to the next to last block at difficulty 8,000 and produce a fork.  I would attempt to repair the chain to the extent possible.  It should be possible to mine new blocks that honored previous payouts (or possibley not, in the case of the grief-miner)


I try to be respectful and informed.
TierNolan
Legendary
*
Offline Offline

Activity: 1232
Merit: 1104


View Profile
July 06, 2013, 09:24:26 PM
 #34

I thought it might be interesting to do some experiments on that block chain, to go back to the next to last block at difficulty 8,000 and produce a fork.  I would attempt to repair the chain to the extent possible.  It should be possible to mine new blocks that honored previous payouts (or possibley not, in the case of the grief-miner)

That is a chain fork, since the new "official" chain would have lower POW.  If you are modifying the client, you could just put a blacklist check for the first high difficulty block.

Presumably, the miner traded his coins before departing?

It is a curse of alt coins that hashing is so variable.

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF
ProfMac
Legendary
*
Offline Offline

Activity: 1246
Merit: 1002



View Profile
July 06, 2013, 10:28:24 PM
 #35

I thought it might be interesting to do some experiments on that block chain, to go back to the next to last block at difficulty 8,000 and produce a fork.  I would attempt to repair the chain to the extent possible.  It should be possible to mine new blocks that honored previous payouts (or possibley not, in the case of the grief-miner)

That is a chain fork, since the new "official" chain would have lower POW.  If you are modifying the client, you could just put a blacklist check for the first high difficulty block.

Presumably, the miner traded his coins before departing?

It is a curse of alt coins that hashing is so variable.

I don't know if he traded them or not.  If he did, I would want to have the miner's block payout to whatever address has coins derived from his.  That will take some analysis, which is how I ended up in this thread.  I have mixed feelings about not replacing his coins.

I also wanted to choose what times-tamps to use on the forked blocks, instead of "now."  I haven't read the client's source.

I try to be respectful and informed.
TierNolan
Legendary
*
Offline Offline

Activity: 1232
Merit: 1104


View Profile
July 06, 2013, 10:31:32 PM
 #36

I don't know if he traded them or not.  If he did, I would want to have the miner's block payout to whatever address has coins derived from his.  That will take some analysis, which is how I ended up in this thread.  I have mixed feelings about not replacing his coins.

I also wanted to choose what times-tamps to use on the forked blocks, instead of "now."  I haven't read the client's source.

If you are changing the client anyway, then you could just force a difficulty change. 

That is what some coins do after they lose lots of hashing power.

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF
ProfMac
Legendary
*
Offline Offline

Activity: 1246
Merit: 1002



View Profile
July 06, 2013, 11:54:15 PM
 #37


If you are changing the client anyway, then you could just force a difficulty change. 

That is what some coins do after they lose lots of hashing power.

That would let us get back to hashing without forking the chain.  I think a chain fork is a very big step.


I try to be respectful and informed.
jratcliff63367 (OP)
Member
**
Offline Offline

Activity: 82
Merit: 10


View Profile
July 07, 2013, 12:00:27 AM
 #38

For those of you interested in seeing the progress of this project; I just uploaded my current work.  The report of what's changed and what this update contains is located at this link on my personal coding blog.  

http://codesuppository.blogspot.com/2013/07/work-in-progress-on-bitcoin-blockchain.html

This afternoon I started focusing on getting my 'parser' to display the same data which shows up on the block-explorer website; and I made good progress on that.

I was just getting ready to make it print out the public-key in ASCII when I found it that doing that is a giant pain in the ass.

You need to have access to a 'big-number' library.  I refactored the version that is in 'cbitcoin' but it's still not producing a valid ASCII string for a given binary input address.  I'm not sure why, but it's late in the day and that's enough for now.

In keeping with my personal coding style/preference all of the code is in 'snippet' form.

That means it's got a single header file which presents a minimal public API and then a single CPP with the implementation of that API.  There are no external dependencies.

So far, I've got the main BlockChain parser in snippet form, also hashes for SHA256 and RIPEMD160, and a snippet that is supposed to convert a bitcoin address from ASCII to binary and from binary to ASCII, but it's not yet fully working.  I have the start of a bitcoin scripting engine; but it doesn't parse much yet.

Thanks for all the help, I don't know when I'll find time to keep working on this, but so far it's been fun.

John
TierNolan
Legendary
*
Offline Offline

Activity: 1232
Merit: 1104


View Profile
July 07, 2013, 12:21:45 AM
 #39

That would let us get back to hashing without forking the chain.  I think a chain fork is a very big step.

Hmm, this is kind of off-topic for the thread.  In the end, a non-fork option would mean that you need to match the POW done.

I was just getting ready to make it print out the public-key in ASCII when I found it that doing that is a giant pain in the ass.

You could print out in hex, as a start.

Quote
You need to have access to a 'big-number' library.  I refactored the version that is in 'cbitcoin' but it's still not producing a valid ASCII string for a given binary input address.  I'm not sure why, but it's late in the day and that's enough for now.

It is always worth checking that you have the endian-ness right.  Also, there are version numbers too.

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF
piotr_n
Legendary
*
Offline Offline

Activity: 2055
Merit: 1359


aka tonikt


View Profile WWW
July 07, 2013, 12:32:39 AM
 #40

printing the address from txout is not something that you can really do in a reliable and ultimate way.
the outputs scripts can be anything. most of them follow the usual patters, but they don't have to really follow any limited number of patterns - in reality there is no such thing as output address of a bitcoin transaction, there is only an output script - if you can hexdump this one, for some people it would be as good as printing some address.

Check out gocoin - my original project of full bitcoin node & cold wallet written in Go.
PGP fingerprint: AB9E A551 E262 A87A 13BB  9059 1BE7 B545 CDF3 FD0E
Pages: « 1 [2] 3 4 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!