Interesting timing for me! About a week ago I decided I was finally going to start learning about all the code underneath the bitcoin protocol and software. Except I wasn't going to actually examine any code (I don't know much C++), I was going to read lots of technical articles and write my own code from the bottom up. Also I was going to do it in Python. Which I had not yet learned. Why? "Because it's there".
So anyway a week later I've read the Satoshi paper, the protocol-rules and protocol-spec docs on the wiki, that very helpful '285 bytes that changed the world', and this:
http://www.michaelnielsen.org/ddi/how-the-bitcoin-protocol-actually-works/ . Plus I've learned some python. And here's what I've got to show for it:
#!/usr/bin/python
from time import ctime
def behex(x):
"Converts the given little-endian slice of bytes into big-endian hex digits"
return x[::-1].encode('hex')
def varint(x):
"""Converts the bytes-slice to a list: first number is the number of bytes"""
"""the varint takes up. Second number is the actual int value. Recommend"""
"""only passing 9 bytes to this function, but extras will be ignored"""
a = int(x[0].encode('hex'), 16)
if a < 253: return (1, a)
elif a == 253: return (3, int(behex(x[1:3]), 16))
elif a == 254: return (5, int(behex(x[1:5]), 16))
elif a == 255: return (9, int(behex(x[1:9]), 16))
class TrInput:
"""A single input in a single transaction. We create it from a slice"""
"""of bytes, but do NOT make assumptions about where it ends. We"""
"""infer that from script length, and further bytes are ignored."""
def __init__(self, data):
self.hash = data[0:32] ; self.index = data[32:36]
(off1, scrlen) = varint(data[36:45])
off1 = off1 + 36 ; off2 = off1 + scrlen ; self.size = off2 + 4
self.script = data[off1:off2] ; self.seqnum = data[off2:self.size]
class TrOutput:
"""A single output in a single transaction. We create it from a slice"""
"""of bytes, but do NOT make assumptions about where it ends. We """
"""infer that from script length, and further bytes are ignored."""
def __init__(self, data):
self.satval = data[0:8] ; (offset, scrlen) = varint(data[8:17])
offset = offset + 8 ; self.size = offset + scrlen
self.script = data[offset:self.size]
class Transaction:
"""Data is a slice of bytes, should start with version"""
"""but can end anywhere, excess bytes are ignored."""
def __init__(self, data):
self.version = int(behex(data[0:4]), 16)
(off1, ninps) = varint(data[4:13]) ; off1 = off1 + 4 ; self.inp = []
for i in range(ninps):
self.inp.append(TrInput(data[off1:]))
off1 = off1 + self.inp[i].size
(off2, nouts) = varint(data[off1:off1+9])
off2 = off2 + off1 ; self.out = []
for i in range(nouts):
self.out.append(TrOutput(data[off2:]))
off2 = off2 + self.out[i].size
self.size = off2 + 4 ; self.locktime = behex(data[off2:self.size])
class Block:
def __init__(self, blkdata):
self.size = len(blkdata) ; self.version = int(behex(blkdata[0:4]), 16)
self.prevhash = blkdata[4:36] ; self.merkle = blkdata[36:68]
self.tstamp = int(behex(blkdata[68:72]), 16)
self.target = behex(blkdata[72:76])
self.nonce = behex(blkdata[76:80])
(offset, self.ntrans) = varint(blkdata[80:89])
offset = offset + 80 ; self.trans = []
for i in range(self.ntrans):
self.trans.append(Transaction(blkdata[offset:]))
offset = offset + self.trans[i].size
def checkmagicbytes(fname):
"Reads 4 bytes from current position in file, checks they're 'magic'"
x = fname.read(4) # why can't I ';' these two lines? (and next two)
if len(x) == 0: return 0 # ???
y = int(behex(x), 16)
if y != 0xD9B4BEF9: return 0
return 1
def target2diff(t):
"Converts the target in 4-byte hexstring form into difficulty (float)"
# MAY not agree with diff given in block explorer for high diffs - check!
i1 = int(t, 16) ; i2 = float(2**(248 - i1/(2**21)))/(i1 % (2**24))
return i2 * 65535/65536 # I THINK that fudge factor fixes things...
bn = 0 ; fn = 0 ; chain = []
while True:
fname = "/home/chris/.bitcoin/blocks/blk{:0=5}.dat".format(fn)
try: bcfile = open(fname,"rb")
except IOError: break
while checkmagicbytes(bcfile): # issue about two failure modes!
bkdata = bcfile.read(int(behex(bcfile.read(4)),16))
chain.append(Block(bkdata)) ; bn = bn + 1
if bn > 100000: break # just for memory purposes for now
fn = fn + 1 ; bcfile.close()
if bn > 100000: break # ditto
bn = bn - 1 ; fn = fn - 1 ; print "Chain ends: block", bn, "file", fn
opt = int(raw_input("Enter a block number:"))
while opt <= bn:
print "Version:", chain[opt].version
print "Prev hash:", behex(chain[opt].prevhash)
print "Merkle root:", behex(chain[opt].merkle)
print "Timestamp:", ctime(chain[opt].tstamp)
print "Difficulty:", target2diff(chain[opt].target)
print "Nonce:", chain[opt].nonce
trs = chain[opt].trans ; n = chain[opt].ntrans
print "# of transactions:", n, "size:", chain[opt].size
for t in range(n):
print "Transaction", t, "Version", trs[t].version,
print "Locktime", trs[t].locktime ; ni = len(trs[t].inp)
# input and output data still needs to be presented readably...
for i in range(ni):
inp = trs[t].inp[i]
print "Input:", i
print "Hash:", inp.hash
print "Index:", inp.index
print "Script:", inp.script
print "Seqnum:", inp.seqnum
no = len(trs[t].out)
for o in range(no):
out = trs[t].out[i]
print "Output:", o
print "Value:", out.satval
print "Script:", out.script
opt = int(raw_input("Enter another block number:"))
Feel free to use it, and any constructive comments/help/tips most welcome. It doesn't do too much. You need the blockchain as downloaded by the official client, it'll read the first 100000 blocks (or more if you remove the brakes, but then be careful about your memory filling up). Then you can enter a block height number and see what's in it. The next things on my to-do list are (a) understand the scripts in the inputs and outputs, (b) shove SHA256 stuff in so I can compute block hashes, merkle roots, transaction ids etc, and (c) use all that to validate the blockchain (at least the strictly linear version - no forks for me yet). Then I guess (d) - work out the most compact way of storing the relevant info from the first n blocks (where, say, n=current block height - a few hundred) in memory. I think they call this 'pruning'.
And I've only NOW started reading John Ratcliff's 'code suppository'
. It looks good!