Bitcoin Forum
December 26, 2024, 11:36:23 PM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1] 2 3 4 5 6 7 8 9 10 11 12 13 »  All
  Print  
Author Topic: libbitcoin  (Read 92493 times)
genjix (OP)
Legendary
*
Offline Offline

Activity: 1232
Merit: 1076


View Profile
July 21, 2011, 02:33:00 AM
Last edit: June 01, 2014, 10:26:50 PM by genjix
Merited by ABCbits (54)
 #1


libbitcoin: Product of Calafou. Endorsed by Enric Duran.

Website, http://libbitcoin.dyne.org
IRC, Freenode, #darkwallet
Forum, https://forum.unsystem.net

libbitcoin is a community of developers building the open source library, tools and implementation necesary for a free, independent and vibrant Bitcoin. In this way we are helping to build a better future.

libbitcoin believes in the revoltionary promise of Satoshi's original protocol.

The libbitcoin development project aims to create an extendable, scalable and configurable architecture, along with useful software. Making Bitcoin super-pluggable, highly configurable and easy to interact with.

Our Values:

  • Privacy: Bitcoin should always remain as private as possible for its users.
  • Scalability: Bitcoin built today with the future in mind.
  • Integrity: No individual or group should have enough power over the network to compromise its original aims.

Here is the git repo,
https://github.com/spesmilo/libbitcoin

By having a bitcoin library, it enables everybody to build apps really quickly using a common stable codebase. This project aims to create that architecture to be extendable, scalable and configurable in the future.

    rewrite bitcoin, make it super-pluggable, very easy to do and hack everything at every level, and very configurable

Documentation: http://libbitcoin.dyne.org/doc/index.html

Slides: http://libbitcoin.dyne.org/libbitcoin.pdf

The Zen of libbitcoin
Readability over speed.
Beauty over convenience.
Simplicity over complexity.
Architected, not hacked.
Flat, not nested.
Explicit, not implicit.
Errors should be loud.
Never is better than right now.
Now is better than never.
Be flexible and configurable.
Build houses from bricks, software from modules.

libbitcoin is licensed as Lesser AGPL:
- When used in the p2p network, you only need to provide changes on demand (LGPL).
- When used on a webserver, you must proactively provide sourcecode for any changes you have made to libbitcoin.
- Applications can link against libbitcoin, and they do not need to release their changes publically.
Thanks to Aaron Williamson of the SFLC and Richard Stallman of the FSF for helping draft up this license.

Mailing list: https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/libbitcoin

Why the GPL?

http://arstechnica.com/gadgets/2013/10/googles-iron-grip-on-android-controlling-open-source-by-any-means-necessary/

http://hintjens.com/blog:68
genjix (OP)
Legendary
*
Offline Offline

Activity: 1232
Merit: 1076


View Profile
July 21, 2011, 02:43:55 AM
 #2



This is the inner core which consists of a bunch of subsystems co-operating over clearly defined interfaces.



As an example one such message could command the block_chain_manager to create a transaction. It then sends the new transaction to the network component to actualise it.



The inner core is very difficult to use so we abstract it's usage to a more usable outer core. Think of the inner core like X-windows and outer core an easy to use GUI toolkit like Qt (by analogy).



Into the core we can switch in and out different components. In this example we choose to replace a MySQL serialiser/de-serialiser with an SQLite one. Maybe there could also be in-memory serialisers (never store to disk) or whatever.

A serialiser takes in a data object and then stores it. The type of component decides where + how it looks when stored. When reading the object it de-serialises in reverse.

The whole point is not to just dump binary blobs into MySQL rows but to put them there in a readable format so an admin can dissect them in case of problems. We also will only create a MySQL serialiser for the time being unless it's possible to use standard SQL and simultaneously support MySQL, PostgresSQL, SQLite.



Above the diagram for the outer core indicated an exposed C api with auto-generated Swig bindings.

We try to keep the core very simple and focused; no mining, no accounts, no GUI, no RPC server. These things can all be implemented in Python. Already genjix has Spesmilo as a Qt GUI using bitcoind that we can easily use since it has a special interface that can be re-implemented for whatever Bitcoin implementation. Accounts can be provided as a Python module or special plugin in C or C++.



As another example, by being able to switch the networking component, we can replace it with a dummy networking component that is used for debugging.



Or use one networking component for multiple instances of Bitcoin or other currencies.

We try to split everything up as much as possible to help developers long term. It makes it easier for new contributors to get started. That's the free-software way: lots of components and clear separation to allow maintainers to work on their piece of turf.

Callbacks

A good plugin+component system is a must for this project. The core will probably have to be thread-safe if we use threads for networking. Otherwise an event system could be a bottleneck since we get the worst of both worlds.

If we make a block-chain verifier subsystem, is there an object to making it one-shot (verify once or every X blocks the block chain in one go) or does it need to be after every block is downloaded?

In any case, it would be nicer to be able to hook the verifier system to a generic callback as opposed to calling it manually in the code.

boost::signals2 seems like a good choice to be considered, although it has to be seen whether there is any bottleneck. Also any callback system would have to be exposed in Python somehow

Random notes:

    * Our main target is large scale deployments. Bitcoin Enterprise (utilising our libbitcoin). The Red Hat of Bitcoin, not the Ubuntu.
    * Use boost::property_map?
    * boost::asio is a good choice for networking.
    * Externally we expose a C api for people that cannot program C++ but use C++ internally. A C ABI is also nice since it doesn't get mangled by the compiler.
    * Status during initialisation for debugging/showing progress in GUI client.

error
Hero Member
*****
Offline Offline

Activity: 588
Merit: 500



View Profile
July 21, 2011, 02:46:19 AM
 #3

Nice work! You'll also have a C++ API, right?

3KzNGwzRZ6SimWuFAgh4TnXzHpruHMZmV8
genjix (OP)
Legendary
*
Offline Offline

Activity: 1232
Merit: 1076


View Profile
July 21, 2011, 02:51:31 AM
Last edit: July 27, 2011, 03:50:07 AM by genjix
 #4

A document describing the rationale behind the database structure.



Axioms

Starting with these basic rules, we design our database.

    * Blocks are mostly sequential.
    * Blocks contain a list of transactions.
    * Transactions contain a list of inputs + outputs.
    * Blocks can be free floating (orphan blocks).
    * Transactions can be free floating (not yet in a block).
    * There is one sequence of events that we accept as the truth at any one time.

From big to small:

    * Blocks
    * Transactions
    * Inputs
    * Outputs
    * Scripts
    * Operations

Block-chain

The block-chain is a series of blocks chained on in a linear sequence. The block-chain is an append only database of all transactions in the network.

What's accepted as true, is the chain with the highest difficulty. Difficulty is computed using a function from the 'bits' field of the block. Given an entire chain, we can calculate the difficulty using:

Code:
    SELECT SUM(COMPUTE_DIFFICULTY(bits)) FROM our_chain;

Occasionally this chain of events may be conflicting when the chain splits in half because there are two next rival blocks trying to append to the current chain. Always, the chain with the highest cumulative difficulty is preferred over the others.

It can happen that more blocks build on a chain that previously had lower difficulty and so take over the main chain. This is termed a block-chain re-organisation. But as more blocks build off the main chain, that becomes exponentially less likely until eventually it becomes a near impossibility. Branches off the main block-chain can be pruned once they reach a certain depth that it's near unlikely they will ever become the main branch again.

We can represent our blocks using this schema:

Code:
blocks:
    block_id
    block_hash
    depth       -- block number in the chain
    span_left
    span_right
    ... other fields



Code:
block_id 	depth 	span_left 	span_right
0 0 0 3
1 1 0 3
2 2 0 2
3 2 3 3
4 3 3 3
5 3 0 2
6 4 3 3
7 4 0 0
8 5 0 0
9 4 1 2
10 5 1 2
11 6 1 2
12 6 0 0
13 7 0 0
14 8 0 0
15 7 1 1
16 7 2 2

In the diagram above blocks are represented by black dots. The block_id is a unique identifier for each new block inserted into the database. Each of the dots above would have a unique block_id.

Depth is the vertical number on the left, and is synonymous with a block's block-number.

Span left and right indicates the range of chains that a block exists in. A block with a span_left of 0, and a span_right of 3, would exist in chains 0, 1, 2 and 3. Using a span, we can easily select a particular chain:

Code:
    SELECT * FROM blocks WHERE span_left=0 AND span_right=0;

Or select the common point between a series of chains:

Code:
    SELECT MAX(depth) FROM blocks WHERE span_left>=0 AND span_right<=2;

Once span_right starts to exceed a value (say 10) and the depth since the chain forked exceeds a certain value, then that chain can be pruned:

Code:
    DELETE FROM blocks WHERE span_left=10 AND span_right=10;

The total span range of all blocks should equal all the number of stored chains. If we have a span of [0, 10] then there should be 11 spans from 0 to 10. A table exists to keep track of all the chains.

Code:
blockchains:
    chain_id
    max_depth
    last_block_id
    total_difficulty

Transaction

Nothing unusual happens here, except that transactions can be included more than once in multiple chains. A seperate table exists to track the parents for a transaction.

Code:
transaction_parents:
    transaction_id
    block_id
    index_in_block

transactions:
    transaction_id
    transaction_hash
    ... other fields

outputs:
    parent_id       -- transaction parent
    index_in_parent
    output_type     -- normal, generated, other
    address
    ... other fields

inputs:
    parent_id
    index_in_parent
    previous_output_id
    ... other fields

Script

A script consists of a serious of opcodes and optional data.

Code:
script_id_type = SEQUENCE

operations:
    operation_id
    script_id
    code
    data

A typical script might appear like:

Code:
operation_id 	script_id 	code 	data
132 45 dup NULL
133 45 hash160 NULL
134 45 special 89 ab cd ef ab ba ab ba ab ba ab ba ab ba ab ba ab ba ab ba
135 45 equalverify NULL
136 45 checksig NULL

To fetch the script:

Code:
    SELECT * FROM operations WHERE script_id=45 ORDER BY operation_id ASC;

Broadcasted inventories

This will be stored in memory for now as there are potential attack vectors to serialising inv items. An attacker could cause many disk read/writes by spamming a host potentially. More thought needs to go into this later.

Main block-chain

A convenience table is provided for the end user to read the block-chain transaction history. It contains the currently accepted version of the truth. It provides a way for the user to quickly query transactions and see funds in addresses.



The inputs and outputs are joined in a view. All the outputs which are final and have not been spent, would have no equivalently joined on input.
genjix (OP)
Legendary
*
Offline Offline

Activity: 1232
Merit: 1076


View Profile
July 21, 2011, 02:52:01 AM
 #5

Nice work! You'll also have a C++ API, right?

Yeah the C++ API is the default, but I want to provide something for C users too later. And also Ruby, Perl, Python, Java .etc through SWIG bindings.
wumpus
Hero Member
*****
Offline Offline

Activity: 812
Merit: 1022

No Maps for These Territories


View Profile
July 21, 2011, 05:36:43 AM
 #6

I think it is a nice design overall, but

A script consists of a serious of opcodes and optional data.

Code:
script_id_type = SEQUENCE

operations:
    operation_id
    script_id
    code
    data

A typical script might appear like:

Code:
operation_id 	script_id 	code 	data
132 45 dup NULL
133 45 hash160 NULL
134 45 special 89 ab cd ef ab ba ab ba ab ba ab ba ab ba ab ba ab ba ab ba
135 45 equalverify NULL
136 45 checksig NULL

To fetch the script:

Code:
    SELECT * FROM operations WHERE script_id=45 ORDER BY operation_id ASC;
I wonder here, what is the advantage to decomposing the scripts into separate database rows, instead of storing it as a binary blob?

I mean, the most common operation is "fetch the entire script and execute the opcodes". By decomposing the data structure this deep this becomes a more expensive query.


Bitcoin Core developer [PGP] Warning: For most, coin loss is a larger risk than coin theft. A disk can die any time. Regularly back up your wallet through FileBackup Wallet to an external storage or the (encrypted!) cloud. Use a separate offline wallet for storing larger amounts.
jrmithdobbs
Newbie
*
Offline Offline

Activity: 67
Merit: 0


View Profile
July 21, 2011, 05:48:13 AM
 #7

I wonder here, what is the advantage to decomposing the scripts into separate database rows, instead of storing it as a binary blob?

I mean, the most common operation is "fetch the entire script and execute the opcodes". By decomposing the data structure this deep this becomes a more expensive query.

For a node, maybe. What about for data processing? Say you just want to quickly find all scripts with the same destination address. The performance difference will be negligible for the amount of flexibility gained. Returning data like this is what RDBMS' are optimized for.
wumpus
Hero Member
*****
Offline Offline

Activity: 812
Merit: 1022

No Maps for These Territories


View Profile
July 21, 2011, 06:29:45 AM
 #8

For a node, maybe. What about for data processing? Say you just want to quickly find all scripts with the same destination address.
Yes there are cases in which you want to search in a script, but they are not very common. Usually you want to optimize for the most common case first.

The performance difference will be negligible for the amount of flexibility gained. Returning data like this is what RDBMS' are optimized for.
Maybe; I wouldn't be so quick to call it 'negligible', as I do know that for the current client, database accesses are by far the biggest source of performance issues.

Anyway, don't do too much premature optimisation, it might not be useful yet to discuss in this phase.

Bitcoin Core developer [PGP] Warning: For most, coin loss is a larger risk than coin theft. A disk can die any time. Regularly back up your wallet through FileBackup Wallet to an external storage or the (encrypted!) cloud. Use a separate offline wallet for storing larger amounts.
LightRider
Legendary
*
Offline Offline

Activity: 1500
Merit: 1022


I advocate the Zeitgeist Movement & Venus Project.


View Profile WWW
July 21, 2011, 06:59:04 AM
 #9

Good effort, thank you so much for doing this work.

Bitcoin combines money, the wrongest thing in the world, with software, the easiest thing in the world to get wrong.
Visit www.thevenusproject.com and www.theZeitgeistMovement.com.
genjix (OP)
Legendary
*
Offline Offline

Activity: 1232
Merit: 1076


View Profile
July 21, 2011, 09:56:19 AM
 #10

Maybe there is some merit to storing the script as:

"dup hash160 [1a 4b ... data ... ff 65] equalverify checksig"

And parsing them in and out.

The primary reason I stored them the way I do is because that's how they're represented internally:

https://gitorious.org/libbitcoin/libbitcoin/blobs/master/include/bitcoin/script.hpp
grondilu
Legendary
*
Offline Offline

Activity: 1288
Merit: 1080


View Profile
July 21, 2011, 01:13:41 PM
 #11

(registering to this thread)

Mike Hearn
Legendary
*
Offline Offline

Activity: 1526
Merit: 1134


View Profile
July 21, 2011, 02:12:30 PM
 #12

Could you summarize what this library is capable of and what remains to be done? I took a quick look but it seems to be mostly outlines so far.
genjix (OP)
Legendary
*
Offline Offline

Activity: 1232
Merit: 1076


View Profile
July 25, 2011, 11:16:39 PM
 #13

ATM it's a very basic library. You can construct transactions and send them over the network. Also you can download blocks/txs and partially verify scripts.

What's missing mainly is the transaction abstraction engine.

TODO:
phase 2:
- OP_CHECKSIG
phase 3:
- EC keypair generating
- High level construction of transactions.
- session class (session->connect();  session->send_funds("1djkj3jk3kj3k", 1000000000); session->get_balance(); .etc)
phase 4:
- Lastly would be to make a gui, but I'm hoping others will build them as this library becomes more mature.
- Fully complete scripting engine that isn't disabled.

As you can see, ATM it's 0.1alpha release.

Progress can be followed at, https://bitcoinconsultancy.com/wiki/index.php/Libbitcoin#Roadmap
jackjack
Legendary
*
Offline Offline

Activity: 1176
Merit: 1280


May Bitcoin be touched by his Noodly Appendage


View Profile
July 26, 2011, 12:33:51 PM
 #14

Monitoring this awesome thread!

A pic has been deleted: http://forum.bitcoin.org/index.php?topic=30646.msg384841#msg384841

Own address: 19QkqAza7BHFTuoz9N8UQkryP4E9jHo4N3 - Pywallet support: 1AQDfx22pKGgXnUZFL1e4UKos3QqvRzNh5 - Bitcointalk++ script support: 1Pxeccscj1ygseTdSV1qUqQCanp2B2NMM2
Pywallet: instructions. Encrypted wallet support, export/import keys/addresses, backup wallets, export/import CSV data from/into wallet, merge wallets, delete/import addresses and transactions, recover altcoins sent to bitcoin addresses, sign/verify messages and files with Bitcoin addresses, recover deleted wallets, etc.
davout
Legendary
*
Offline Offline

Activity: 1372
Merit: 1008


1davout


View Profile WWW
August 14, 2011, 06:49:30 AM
 #15

Monitoring this awesome thread!
+1

ThomasV
Legendary
*
Offline Offline

Activity: 1896
Merit: 1355



View Profile WWW
August 14, 2011, 07:42:58 AM
 #16

this is fantastic. thank you!!!

Electrum: the convenience of a web wallet, without the risks
vector76
Member
**
Offline Offline

Activity: 70
Merit: 18


View Profile
August 14, 2011, 01:37:17 PM
 #17

Great work, I just have a question about the span.  I don't understand how it works, because if you fork the block chain, won't all the existing span numbers have to be incremented?

I've been trying to implement a sqlite backend (so far just dump to sqlite which is mostly working), and it seems to be helpful to distinguish between dynamic data and data which is immutable such as the successor relationship between blocks, transaction containment within a block, etc.

Whether a block is in the best chain and whether a txout has been spent in the best chain is dynamic, so I have to implement queries that more or less correspond to ConnectBlock and DisconnectBlock.  I would love to be able to eliminate it using your span mechanism or something similar but I'm not understanding how it works.
genjix (OP)
Legendary
*
Offline Offline

Activity: 1232
Merit: 1076


View Profile
August 14, 2011, 02:53:50 PM
 #18

Great work, I just have a question about the span.  I don't understand how it works, because if you fork the block chain, won't all the existing span numbers have to be incremented?

I've been trying to implement a sqlite backend (so far just dump to sqlite which is mostly working), and it seems to be helpful to distinguish between dynamic data and data which is immutable such as the successor relationship between blocks, transaction containment within a block, etc.

Whether a block is in the best chain and whether a txout has been spent in the best chain is dynamic, so I have to implement queries that more or less correspond to ConnectBlock and DisconnectBlock.  I would love to be able to eliminate it using your span mechanism or something similar but I'm not understanding how it works.

I'm still working on ProcessBlock (verifying the blocks) and everything below it (ConnectBlock/DisconnectBlock). You can see an example of how the span works in src/storage/postgresql_storage.cpp:reorganize_block_chain()... That function will be moved internally and within verify_block under include/bitcoin/verify.hpp

Indeed you're correct that all the spans of the parents would have to be incremented if you fork the block chain. This is OK since it can be done in a simple single UPDATE, inserting a new block is an infrequent operation (doesn't have to be as fast as SELECTs) and it's not a big deal if adding a new block isn't immediate.

Far more important is the select speed. If you run the poller and postgres benchmark:

Code:
$ ./bin/examples/poller dbname dbuser dbpass localhost:8333
... wait to download all the blocks
$ make psql
$ time ./bin/tests/psql
... run benchmark to fetch 100 blocks

With my current setup it takes 10 secs to fetch 100 blocks or 0.1 secs for 1 block. That means 4 hours to fetch the entire block-chain of 140k blocks. Using a rule of thumb that verifying the block-chain will be 10 times as slow, that might mean 40 hours to verify all 140k blocks. ATM I plan to just bundle the first 120k pre-verified blocks in the code, then later see about optimising this.

But I am running it over a VPS on rackspace, so that also could be a reason for the speed too.

It should be possible for you to easily modify the current storage engine I'm using to enable sqllite... The library supports multiple backends. See the constructor in src/storage/postgresql_storage.cpp:postgresql_storage() ... The sql_ object is being explicitly initialised with postgresql parameters. I don't mind making this a generic sql_storage backend if you're able to get it to work Smiley

The current way of joining the blocks using reorganize_block_chain() will be removed and done as part of the block-chain verification- that's the best way I see currently to do it... But I am doing more research of the current bitcoin sourcecode to make sure I understand everything fully. The current bitcoin is very inefficient/obtuse and there's a whole bunch of ways to do things quicker or nicer.
vector76
Member
**
Offline Offline

Activity: 70
Merit: 18


View Profile
August 14, 2011, 06:58:43 PM
 #19

Ok that makes sense.  I am a bit concerned that even though the update is a single statement, it must touch a lot of rows, but it should be infrequent enough to not be a problem.

My next question is regarding the very common operation of determining whether an output has been spent.  Each output could be spent in a single transaction that belongs to multiple blocks (in separate forks) or by multiple transactions (also in separate forks).

So it seems that to determine if an output has been spent in a particular chain, you will have to query for all the inputs that spend the output, find (JOIN) the transactions that contain those inputs, find (JOIN) the blocks that contain those transactions, and see if any of those blocks have a span range that contains the chain in question.  I know databases are supposed to be good at this kind of thing but it sure sounds like a lot of work.  On the other hand, most of the transactions will not have a double-spend, so all the joining shouldn't make a difference in the most common case where there are no inputs that spend a txout.  All the joining might slow it down a bit when the blockchain forks, but a few extra indexed lookups shouldn't be that big a deal I guess.

I'm thinking I'll switch to your schema with the spans, because it bugs me how my version has dynamic fields marking whether a block is part of the 'best chain' and whether a txout has been spent in the current best chain.  It is a real headache to add a block to a fork other than the current best.
genjix (OP)
Legendary
*
Offline Offline

Activity: 1232
Merit: 1076


View Profile
August 15, 2011, 06:22:09 AM
 #20

Ok that makes sense.  I am a bit concerned that even though the update is a single statement, it must touch a lot of rows, but it should be infrequent enough to not be a problem.

My next question is regarding the very common operation of determining whether an output has been spent.  Each output could be spent in a single transaction that belongs to multiple blocks (in separate forks) or by multiple transactions (also in separate forks).

So it seems that to determine if an output has been spent in a particular chain, you will have to query for all the inputs that spend the output, find (JOIN) the transactions that contain those inputs, find (JOIN) the blocks that contain those transactions, and see if any of those blocks have a span range that contains the chain in question.  I know databases are supposed to be good at this kind of thing but it sure sounds like a lot of work.  On the other hand, most of the transactions will not have a double-spend, so all the joining shouldn't make a difference in the most common case where there are no inputs that spend a txout.  All the joining might slow it down a bit when the blockchain forks, but a few extra indexed lookups shouldn't be that big a deal I guess.

I'm thinking I'll switch to your schema with the spans, because it bugs me how my version has dynamic fields marking whether a block is part of the 'best chain' and whether a txout has been spent in the current best chain.  It is a real headache to add a block to a fork other than the current best.

I plan to add a flag indicating whether an output is_final (and the parent transaction is_final - if all child outputs all have is_final set) and an output_address and for transaction a transaction_type (standard, generated and other) fields.

Also will be another table with chain_id, cumulative_difficulty... and cumulative difficulty in blocks.

For now still working on the block chain verification specifics (which algorithm is best and whether to organize then verify, pre-verify then organize then verify, or verify then link and organize). There are various pros and cons for all the different ways.
Pages: [1] 2 3 4 5 6 7 8 9 10 11 12 13 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!