Bitcoin Forum
August 16, 2022, 12:47:18 AM
 News: Latest Bitcoin Core release: 23.0 [Torrent]
 Home Help Search Login Register More
 Pages: [1]
 Author Topic: [CORRECTED] Script calculates 71% freeable transactions  (Read 2087 times)
John Tobey
Hero Member

Offline

Activity: 481
Merit: 519

 May 23, 2011, 07:05:33 AMLast edit: May 23, 2011, 02:47:54 PM by johntobey253

"nobody has done studies of the existing block chain to see how much space could be reclaimed."  --https://en.bitcoin.it/wiki/Scalability

So I gave it a go.  I hope someone will check my logic, especially as I am new to Python.

\$ time src/bitcointools/freeable.py
201085 of 544296 tx freeable (corrected: 387616 of 544296 tx)
63328760 of 168627001 tx bytes freeable (corrected: 123285900 of 168627001 tx bytes)

real   0m47.131s
user   0m43.799s
sys   0m2.744s
\$ ls -l .bitcoin/blk0*.dat
-rw------- 1 jtobey jtobey 180119823 2011-05-23 00:42 .bitcoin/blk0001.dat

(edited: fixed bug http://forum.bitcoin.org/index.php?topic=9461.msg137059#msg137059) diff of bitcointools:
Code:
diff --git a/deserialize.py b/deserialize.py
index fe0cb09..a67645b 100644
--- a/deserialize.py
+++ b/deserialize.py
@@ -75,6 +75,7 @@ def deserialize_TxOut(d, owner_keys=None):

def parse_Transaction(vds):
d = {}
+  start = vds.read_cursor  # XXX breaks BCDataStream abstraction, do we care?
d['txIn'] = []
@@ -85,6 +86,7 @@ def parse_Transaction(vds):
for i in xrange(n_vout):
d['txOut'].append(parse_TxOut(vds))
return d
def deserialize_Transaction(d, transaction_index=None, owner_keys=None):
result = "%d tx in, %d out\n"%(len(d['txIn']), len(d['txOut']))
diff --git a/freeable.py b/freeable.py
new file mode 100755
index 0000000..c79f458
--- /dev/null
+++ b/freeable.py
@@ -0,0 +1,87 @@
+#!/usr/bin/env python
+#
+# Read the block database, find out how many transactions
+# could be purged and how many bytes they take up.
+# TODO: find out how many Merkle hashes could be purged.
+#
+from bsddb.db import *
+import logging
+import os
+import sys
+
+from BCDataStream import *
+from block import scan_blocks
+from deserialize import parse_Block
+from util import determine_db_dir, create_env
+import Crypto.Hash.SHA256 as SHA256
+
+import binascii  # debugging
+
+def main():
+  import optparse
+  parser = optparse.OptionParser(usage="%prog [options]")
+                    help="Look for files here (defaults to bitcoin default)")
+  (options, args) = parser.parse_args()
+
+    db_dir = determine_db_dir()
+  else:
+
+  try:
+    db_env = create_env(db_dir)
+  except DBNoSuchFileError:
+    logging.error("Couldn't open " + db_dir)
+    sys.exit(1)
+
+  blockfile = open(os.path.join(db_dir, "blk%04d.dat"%(1,)), "rb")
+  block_datastream = BCDataStream()
+  block_datastream.map_file(blockfile, 0)
+
+  blocks = []
+  def gather_stats(block_data):
+    block_datastream.seek_file(block_data['nBlockPos'])
+    blocks.append(parse_Block(block_datastream))
+    return True
+
+  scan_blocks(db_dir, db_env, gather_stats)
+  blocks.reverse()
+  print 'loaded ', len(blocks), ' blocks'
+
+  tx = {}
+  total_tx = 0
+  freeable_tx = 0
+  total_bytes = 0
+  freeable_bytes = 0
+
+  for data in blocks:
+    coinbase = True
+    for txn in data['transactions']:
+      tx_hash = SHA256.new(SHA256.new(txn['tx']).digest()).digest()
+      #print '> ', binascii.hexlify(tx_hash)
+      tx_bytes = len(txn['tx'])
+      tx[tx_hash] = (tx_bytes, len(txn['txOut']))
+      total_tx += 1
+      total_bytes += tx_bytes
+
+      if coinbase:
+        coinbase = False
+      else:
+        for txin in txn['txIn']:
+          #print '< ', binascii.hexlify(txin['prevout_hash'])
+          (bytes, live) = tx[txin['prevout_hash']]
+          if live == 1:
+            freeable_bytes += bytes
+            freeable_tx += 1
+            del tx[txin['prevout_hash']]
+          else:
+            tx[txin['prevout_hash']] = (bytes, live - 1)
+
+  db_env.close()
+
+  print freeable_tx, 'of', total_tx, 'tx freeable'
+  print freeable_bytes, 'of', total_bytes, 'tx bytes freeable'
+
+if __name__ == '__main__':
+    main()

Can a change to the best-chain criteria protect against 51% to 90+% attacks without a hard fork?
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.
1660610838
Hero Member

Offline

Posts: 1660610838

Ignore
 1660610838

1660610838
 Report to moderator
1660610838
Hero Member

Offline

Posts: 1660610838

Ignore
 1660610838

1660610838
 Report to moderator
ByteCoin
Sr. Member

Offline

Activity: 416
Merit: 277

 May 23, 2011, 10:27:20 AM

201085 of 544296 tx freeable
63328760 of 168627001 tx bytes freeable

Good work! I hope I'll get around to checking it soon.

One complication - Although you can throw away the transactions, you have to keep the Merkle hash tree stubs so that's roughly 32 bytes per transaction (because the tree height is low) which you have to store.

ByteCoin
Mike Hearn
Legendary

Offline

Activity: 1526
Merit: 1056

 May 23, 2011, 11:33:26 AM

Great work! Simulating the merkle tree collapse is probably much harder, but even so the fact that 36% can be deleted today is great. I'd expect that number to only go up as the economy picks up pace.

Could you update the wiki page with your findings?
John Tobey
Hero Member

Offline

Activity: 481
Merit: 519

 May 23, 2011, 02:15:36 PM

Great work! Simulating the merkle tree collapse is probably much harder, but even so the fact that 36% can be deleted today is great. I'd expect that number to only go up as the economy picks up pace.

Could you update the wiki page with your findings?

Updated: "As of May, 2011, the software does not implement pruning, and the savings are calculated {link here} at 37% of transactions or 35% of raw block bytes."

I will not have time to calculate the Merkle tree prunings.  I guess a lower bound is the transaction count times 32, and an upper bound is double that.  Adding the former number to the freeable tx bytes and dividing by the raw file size gives me 38.7%, which is slightly higher than the rate in both transaction count and transaction bytes.  This seems wrong to me, but I haven't given it serious thought.

Can a change to the best-chain criteria protect against 51% to 90+% attacks without a hard fork?
John Tobey
Hero Member

Offline

Activity: 481
Merit: 519

 May 23, 2011, 02:32:09 PM

This seems wrong to me, but I haven't given it serious thought.

OOH!  Bug fix.  "tx[txin['prevout_n']]" should be "tx[txin['prevout_hash']]".  This improves the stats quite a bit:

387616 of 544296 tx freeable   (71.2%)
123285900 of 168627001 tx bytes freeable  (73.1%)

(387616*32+123285900)/180119823 = 75.3%, still too high I suspect.

Updating wiki...

Can a change to the best-chain criteria protect against 51% to 90+% attacks without a hard fork?
theymos
Legendary

Offline

Activity: 4578
Merit: 9795

 May 24, 2011, 02:43:44 AM

I get similar numbers using BBE data. 391288 transactions and 124638004 bytes over 126191 blocks.

John Tobey
Hero Member

Offline

Activity: 481
Merit: 519

 May 24, 2011, 01:25:48 PM

I get similar numbers using BBE data. 391288 transactions and 124638004 bytes over 126191 blocks.

Thanks for checking.

Can a change to the best-chain criteria protect against 51% to 90+% attacks without a hard fork?
 Pages: [1]