Bitcoin Forum
April 19, 2024, 04:53:00 PM *
News: Latest Bitcoin Core release: 26.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: [CORRECTED] Script calculates 71% freeable transactions  (Read 2096 times)
John Tobey (OP)
Hero Member
*****
Offline Offline

Activity: 481
Merit: 529



View Profile WWW
May 23, 2011, 07:05:33 AM
Last edit: May 23, 2011, 02:47:54 PM by johntobey253
 #1

"nobody has done studies of the existing block chain to see how much space could be reclaimed."  --https://en.bitcoin.it/wiki/Scalability

So I gave it a go.  I hope someone will check my logic, especially as I am new to Python.

$ time src/bitcointools/freeable.py
loaded  125925  blocks
201085 of 544296 tx freeable (corrected: 387616 of 544296 tx)
63328760 of 168627001 tx bytes freeable (corrected: 123285900 of 168627001 tx bytes)

real   0m47.131s
user   0m43.799s
sys   0m2.744s
$ ls -l .bitcoin/blk0*.dat
-rw------- 1 jtobey jtobey 180119823 2011-05-23 00:42 .bitcoin/blk0001.dat

(edited: fixed bug http://forum.bitcoin.org/index.php?topic=9461.msg137059#msg137059) diff of bitcointools:
Code:
diff --git a/deserialize.py b/deserialize.py
index fe0cb09..a67645b 100644
--- a/deserialize.py
+++ b/deserialize.py
@@ -75,6 +75,7 @@ def deserialize_TxOut(d, owner_keys=None):
 
 def parse_Transaction(vds):
   d = {}
+  start = vds.read_cursor  # XXX breaks BCDataStream abstraction, do we care?
   d['version'] = vds.read_int32()
   n_vin = vds.read_compact_size()
   d['txIn'] = []
@@ -85,6 +86,7 @@ def parse_Transaction(vds):
   for i in xrange(n_vout):
     d['txOut'].append(parse_TxOut(vds))
   d['lockTime'] = vds.read_uint32()
+  d['tx'] = vds.input[start:vds.read_cursor]
   return d
 def deserialize_Transaction(d, transaction_index=None, owner_keys=None):
   result = "%d tx in, %d out\n"%(len(d['txIn']), len(d['txOut']))
diff --git a/freeable.py b/freeable.py
new file mode 100755
index 0000000..c79f458
--- /dev/null
+++ b/freeable.py
@@ -0,0 +1,87 @@
+#!/usr/bin/env python
+#
+# Read the block database, find out how many transactions
+# could be purged and how many bytes they take up.
+# TODO: find out how many Merkle hashes could be purged.
+#
+from bsddb.db import *
+import logging
+import os
+import sys
+
+from BCDataStream import *
+from block import scan_blocks
+from deserialize import parse_Block
+from util import determine_db_dir, create_env
+import Crypto.Hash.SHA256 as SHA256
+
+import binascii  # debugging
+
+def main():
+  import optparse
+  parser = optparse.OptionParser(usage="%prog [options]")
+  parser.add_option("--datadir", dest="datadir", default=None,
+                    help="Look for files here (defaults to bitcoin default)")
+  (options, args) = parser.parse_args()
+
+  if options.datadir is None:
+    db_dir = determine_db_dir()
+  else:
+    db_dir = options.datadir
+
+  try:
+    db_env = create_env(db_dir)
+  except DBNoSuchFileError:
+    logging.error("Couldn't open " + db_dir)
+    sys.exit(1)
+
+  blockfile = open(os.path.join(db_dir, "blk%04d.dat"%(1,)), "rb")
+  block_datastream = BCDataStream()
+  block_datastream.map_file(blockfile, 0)
+
+  blocks = []
+  def gather_stats(block_data):
+    block_datastream.seek_file(block_data['nBlockPos'])
+    blocks.append(parse_Block(block_datastream))
+    return True
+
+  scan_blocks(db_dir, db_env, gather_stats)
+  blocks.reverse()
+  print 'loaded ', len(blocks), ' blocks'
+
+  tx = {}
+  total_tx = 0
+  freeable_tx = 0
+  total_bytes = 0
+  freeable_bytes = 0
+
+  for data in blocks:
+    coinbase = True
+    for txn in data['transactions']:
+      tx_hash = SHA256.new(SHA256.new(txn['tx']).digest()).digest()
+      #print '> ', binascii.hexlify(tx_hash)
+      tx_bytes = len(txn['tx'])
+      tx[tx_hash] = (tx_bytes, len(txn['txOut']))
+      total_tx += 1
+      total_bytes += tx_bytes
+
+      if coinbase:
+        coinbase = False
+      else:
+        for txin in txn['txIn']:
+          #print '< ', binascii.hexlify(txin['prevout_hash'])
+          (bytes, live) = tx[txin['prevout_hash']]
+          if live == 1:
+            freeable_bytes += bytes
+            freeable_tx += 1
+            del tx[txin['prevout_hash']]
+          else:
+            tx[txin['prevout_hash']] = (bytes, live - 1)
+
+  db_env.close()
+
+  print freeable_tx, 'of', total_tx, 'tx freeable'
+  print freeable_bytes, 'of', total_bytes, 'tx bytes freeable'
+
+if __name__ == '__main__':
+    main()

Can a change to the best-chain criteria protect against 51% to 90+% attacks without a hard fork?
The Bitcoin software, network, and concept is called "Bitcoin" with a capitalized "B". Bitcoin currency units are called "bitcoins" with a lowercase "b" -- this is often abbreviated BTC.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1713545580
Hero Member
*
Offline Offline

Posts: 1713545580

View Profile Personal Message (Offline)

Ignore
1713545580
Reply with quote  #2

1713545580
Report to moderator
1713545580
Hero Member
*
Offline Offline

Posts: 1713545580

View Profile Personal Message (Offline)

Ignore
1713545580
Reply with quote  #2

1713545580
Report to moderator
ByteCoin
Sr. Member
****
expert
Offline Offline

Activity: 416
Merit: 277


View Profile
May 23, 2011, 10:27:20 AM
 #2

201085 of 544296 tx freeable
63328760 of 168627001 tx bytes freeable


Good work! I hope I'll get around to checking it soon.

One complication - Although you can throw away the transactions, you have to keep the Merkle hash tree stubs so that's roughly 32 bytes per transaction (because the tree height is low) which you have to store.

ByteCoin
Mike Hearn
Legendary
*
expert
Offline Offline

Activity: 1526
Merit: 1128


View Profile
May 23, 2011, 11:33:26 AM
 #3

Great work! Simulating the merkle tree collapse is probably much harder, but even so the fact that 36% can be deleted today is great. I'd expect that number to only go up as the economy picks up pace.

Could you update the wiki page with your findings?
John Tobey (OP)
Hero Member
*****
Offline Offline

Activity: 481
Merit: 529



View Profile WWW
May 23, 2011, 02:15:36 PM
 #4

Great work! Simulating the merkle tree collapse is probably much harder, but even so the fact that 36% can be deleted today is great. I'd expect that number to only go up as the economy picks up pace.

Could you update the wiki page with your findings?

Updated: "As of May, 2011, the software does not implement pruning, and the savings are calculated {link here} at 37% of transactions or 35% of raw block bytes."

I will not have time to calculate the Merkle tree prunings.  I guess a lower bound is the transaction count times 32, and an upper bound is double that.  Adding the former number to the freeable tx bytes and dividing by the raw file size gives me 38.7%, which is slightly higher than the rate in both transaction count and transaction bytes.  This seems wrong to me, but I haven't given it serious thought.

Can a change to the best-chain criteria protect against 51% to 90+% attacks without a hard fork?
John Tobey (OP)
Hero Member
*****
Offline Offline

Activity: 481
Merit: 529



View Profile WWW
May 23, 2011, 02:32:09 PM
 #5

This seems wrong to me, but I haven't given it serious thought.

OOH!  Bug fix.  "tx[txin['prevout_n']]" should be "tx[txin['prevout_hash']]".  This improves the stats quite a bit:

loaded  125925  blocks
387616 of 544296 tx freeable   (71.2%)
123285900 of 168627001 tx bytes freeable  (73.1%)

(387616*32+123285900)/180119823 = 75.3%, still too high I suspect.

Updating wiki...

Can a change to the best-chain criteria protect against 51% to 90+% attacks without a hard fork?
theymos
Administrator
Legendary
*
Offline Offline

Activity: 5166
Merit: 12865


View Profile
May 24, 2011, 02:43:44 AM
 #6

I get similar numbers using BBE data. 391288 transactions and 124638004 bytes over 126191 blocks.

1NXYoJ5xU91Jp83XfVMHwwTUyZFK64BoAD
John Tobey (OP)
Hero Member
*****
Offline Offline

Activity: 481
Merit: 529



View Profile WWW
May 24, 2011, 01:25:48 PM
 #7

I get similar numbers using BBE data. 391288 transactions and 124638004 bytes over 126191 blocks.

Thanks for checking.

Can a change to the best-chain criteria protect against 51% to 90+% attacks without a hard fork?
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!