John Tobey (OP)
|
|
August 14, 2011, 07:04:40 PM |
|
Import status: now at block 62,000 (after 1.5 days). Note that I have a pretty slow machine that only uses 30W :-), not exactly made for db loads.
But still, that's too long. Block 62,000 is only about 5% of the way through, since the blocks are bigger now. While waiting for the import, I reverse-engineered the schema using mysql-workbench.
Thanks! Please update to the latest commit on the master branch and restart your import (it'll pick up where it left off) since I've indexed txin.txout_id. Not sure if it will help here, though. I won't be surprised if there's unneeded table scanning on import. PostgreSQL took a few hours on my 4-year-old laptop last I tried, but Pg may implement foreign keys differently, and Abe currently relies on foreign keys for implicit indexing. You could add txout->pubkey since MySQL supports null foreign keys. Apparently some do not. Anyway, txout.pubkey_id is indexed, I hope?
|
|
|
|
John Tobey (OP)
|
|
August 15, 2011, 01:49:50 AM |
|
Are you on IRC? You should hit me up on #bitcoinconsultancy
Thanks, but I have too many time obligations (ages 5 and 2) to justify regular IRC use. You can quickly query the total difficulty for a chain using a single SQL command:
SELECT SUM(to_difficulty(bits_head, bits_body)) FROM blocks WHERE span_left=X AND span_right=X;
Will give you the total difficulty in chain X (you have to have to_difficulty() defined as: bits_body * 2^[8*(bits_head - 3)] )
Great, but this presumably reads every block row. Are you familiar with the /q/nethash function? Abe reads every Nth (default: 144th) block's block_chain_work and subtracts the previous from the current value as part of the netHashPerSecond computation. So I do only one query that returns (at the moment) 980 rows for 979 resulting values. (It reads two table rows for every row it returns due to the trick by which I implement "every Nth" in portable SQL.) I suppose not all applications would benefit from this optimization, and I could bolt it onto your schema via trigger or patch if necessary. (Joys of open source!)
|
|
|
|
genjix
Legendary
Offline
Activity: 1232
Merit: 1076
|
|
August 15, 2011, 02:57:07 AM |
|
I'll probably add cumulative difficulty into the blocks very soon.
Still thinking about various design issues though.
|
|
|
|
John Tobey (OP)
|
|
August 15, 2011, 03:25:42 AM |
|
Import status: now at block 62,000 (after 1.5 days). Note that I have a pretty slow machine that only uses 30W :-), not exactly made for db loads.
But still, that's too long. Block 62,000 is only about 5% of the way through, since the blocks are bigger now. I've committed rewrites of two queries that MySQL did not fully optimize. My (very informal) testing suggests a speedup of 10x. Let me know if it helps.
|
|
|
|
John Tobey (OP)
|
|
August 15, 2011, 04:38:59 AM |
|
I'll probably add cumulative difficulty into the blocks very soon.
Cool. Still thinking about various design issues though.
I might suggest one thing from experience with Abe... CREATE TABLE inputs ( ... previous_output_id INT, previous_output_hash hash_type NOT NULL, previous_output_index BIGINT NOT NULL,
A hash per input will be a large fraction of storage space. Abe avoids this by putting unlinked inputs in a separate table and then deleting them when it finds the output: CREATE TABLE txin ( txin_id NUMERIC(26) PRIMARY KEY, tx_id NUMERIC(26) NOT NULL, txin_pos NUMERIC(10) NOT NULL, txout_id NUMERIC(26), ...)...
CREATE TABLE unlinked_txin ( txin_id NUMERIC(26) PRIMARY KEY, txout_tx_hash BIT(256) NOT NULL, txout_pos NUMERIC(10) NOT NULL, FOREIGN KEY (txin_id) REFERENCES txin (txin_id) )
After the unlinked_txin is deleted, you can get the previous output's transaction hash via txin.txout_id to txout.tx_id to tx.tx_hash.
|
|
|
|
molecular
Donator
Legendary
Offline
Activity: 2772
Merit: 1019
|
|
August 15, 2011, 05:36:36 AM |
|
Import status: now at block 62,000 (after 1.5 days). Note that I have a pretty slow machine that only uses 30W :-), not exactly made for db loads.
But still, that's too long. Block 62,000 is only about 5% of the way through, since the blocks are bigger now. I've committed rewrites of two queries that MySQL did not fully optimize. My (very informal) testing suggests a speedup of 10x. Let me know if it helps. yesss! at least 10 times! Thanks so much, now it seems feasable to get the data in You're awesome. I'll answer your questions from earlier post as soon as my mysql-workbench (which I updated and currently crashed consistently on startup) is back to operational state.
|
PGP key molecular F9B70769 fingerprint 9CDD C0D3 20F8 279F 6BE0 3F39 FC49 2362 F9B7 0769
|
|
|
genjix
Legendary
Offline
Activity: 1232
Merit: 1076
|
|
August 15, 2011, 06:32:01 AM |
|
I'll probably add cumulative difficulty into the blocks very soon.
Cool. Still thinking about various design issues though.
I might suggest one thing from experience with Abe... CREATE TABLE inputs ( ... previous_output_id INT, previous_output_hash hash_type NOT NULL, previous_output_index BIGINT NOT NULL,
A hash per input will be a large fraction of storage space. Abe avoids this by putting unlinked inputs in a separate table and then deleting them when it finds the output: CREATE TABLE txin ( txin_id NUMERIC(26) PRIMARY KEY, tx_id NUMERIC(26) NOT NULL, txin_pos NUMERIC(10) NOT NULL, txout_id NUMERIC(26), ...)...
CREATE TABLE unlinked_txin ( txin_id NUMERIC(26) PRIMARY KEY, txout_tx_hash BIT(256) NOT NULL, txout_pos NUMERIC(10) NOT NULL, FOREIGN KEY (txin_id) REFERENCES txin (txin_id) )
After the unlinked_txin is deleted, you can get the previous output's transaction hash via txin.txout_id to txout.tx_id to tx.tx_hash. I'm not so sure that it's good to prune out the previous_output_hash since when getting an entire block, you want the entire info (which includes the hashes)- especially for displaying on a block explorer type website, hashing the transaction or performing the OP_CHECKSIG .etc The fact is that you'll be looking up the previous hash so often that it'd be a wasted optimisation and not gain too much either (storage is cheap today). Also impact on performance is minimal since we've got fetching blocks down to being blindingly FAST. There could be merit in updating the previous_transaction_id field when you connect the blocks, but I can't see a use case that often for cycling *backwards* in the blockchain (forwards yes, but not backwards). You only really look backwards once - during the initial block-chain verification which doesn't quite warrant the effort of adding this field.
|
|
|
|
John Tobey (OP)
|
|
August 15, 2011, 02:04:27 PM |
|
I'm not so sure that it's good to prune out the previous_output_hash since when getting an entire block, you want the entire info (which includes the hashes)- especially for displaying on a block explorer type website, hashing the transaction or performing the OP_CHECKSIG .etc
Good points.
|
|
|
|
genjix
Legendary
Offline
Activity: 1232
Merit: 1076
|
|
August 15, 2011, 05:17:55 PM |
|
Thanks.
I'm disconnecting from the forums for a few days to focus. I can be contacted using my email on the front of bitcoin.org
|
|
|
|
John Tobey (OP)
|
|
August 15, 2011, 05:23:28 PM |
|
Still thinking about various design issues though.
One more request relevant to Abe. Would you please design the library with alternative chains in mind? This is a must-have for Abe, which has good support for non-BTC currencies and a correspondingly inclined user base. CREATE DOMAIN amount_type AS NUMERIC(16, CHECK (VALUE < 21000000 AND VALUE >= 0); 21 million is BTC-specific. CREATE DOMAIN address_type AS VARCHAR(110);
I'm not sure what you have in mind for addresses, but bear in mind that different currencies use different "address type" bytes and thus different address spaces. Abe doesn't store addresses at all, just the public key hash160. It efficiently looks up an address by extracting the pubkey hash (base 58 decode) and using that as a key. It even does initial substring searches for addresses by converting the substring into one or a few pubkey hash ranges. Address owners can (and do) use the same key pair in different networks under different names (addresses). I expect this to become more common, and a frequent query will be to find balances of a given pubkey hash in all known currencies. I also anticipate chain splits where both sides remain active indefinitely, especially with the advent of merged mining. I like your span_left/span_right system for tracking small side chains, since it gives me a quick way to tell if any given block is an ancestor to another. But I'm not sure how to apply it to this case. Would you consider moving the span columns to a new table indexed by block_id and chain_id, corresponding to Abe's chain_candidate? Cool stuff. Thanks.
|
|
|
|
John Tobey (OP)
|
|
August 16, 2011, 07:32:40 AM |
|
Versions 0.4.1 and 0.5 released today. 0.4.1 contains mostly minor fixes since 0.4. 0.5 contains dramatic speed improvements, new back-ends, and more. New in 0.5 - 2011-08-16* Big speed improvement for address history and transaction pages. * Big load time improvement for SQLite: below 10 hours for the BTC chain. * MySQL supported. * Oracle supported, but slow due to lack of transparent bind variable use in cx_Oracle. * BBE-compatible HTTP API functions: nethash totalbc addresstohash hashtoaddress hashpubkey checkaddress * New HTTP API functions: translate_address decode_address * Online list of API functions (/q). * Native BeerTokens currency support. * Many minor improvements; see the Git log. New in 0.4.1 - 2011-08-16* Security enhancement: refer to orphan blocks by hash, not height. * Fixed bugs affecting new chains defined via the configuration. * Warn, do not exit, if a block file is missing or unparsable. * Abe parses the new merged-mining block field, CAuxPow. * Decrement the value returned by getblockcount for compatibility. * Bug fix: remove '-' from parenthesized amounts. * Fixed previous/next block links on /chain/CHAIN/b/NUMBER pages. * Accept "var += val" in configuration as equivalent to "var = val" where "var" has not been defined. * Added --commit-bytes option to adjust the database commit interval. * Minor robustness and cosmetic improvements. Enjoy!
|
|
|
|
molecular
Donator
Legendary
Offline
Activity: 2772
Merit: 1019
|
|
August 16, 2011, 08:17:31 AM |
|
I hope this is the right place to ask this and not considered offtopic. I'm having some fun writing queries for my (finally finished after half an additional day of importing, thanks again for the optimization, John) bitcoin-abe db (example: https://bitcointalk.org/index.php?topic=37333.msg458441#msg458441). Now I know I'm making the mistake of including blocks from orphaned chains. Is there an easy way to exclude blocks from orphaned chains in a sql query?
|
PGP key molecular F9B70769 fingerprint 9CDD C0D3 20F8 279F 6BE0 3F39 FC49 2362 F9B7 0769
|
|
|
John Tobey (OP)
|
|
August 16, 2011, 01:33:43 PM |
|
I hope this is the right place to ask this and not considered offtopic.
I don't object if moderators don't. I guess we could also use the Github issue system. (The issue here is a lack of database documentation or user guide.) Now I know I'm making the mistake of including blocks from orphaned chains.
Is there an easy way to exclude blocks from orphaned chains in a sql query?
Yes, well not too hard: select b.* from block b join chain_candidate cc on (b.block_id = cc.block_id) where cc.chain_id = ? and cc.in_longest = 1
|
|
|
|
molecular
Donator
Legendary
Offline
Activity: 2772
Merit: 1019
|
|
August 16, 2011, 04:28:05 PM |
|
Now I know I'm making the mistake of including blocks from orphaned chains.
Is there an easy way to exclude blocks from orphaned chains in a sql query?
Yes, well not too hard: select b.* from block b join chain_candidate cc on (b.block_id = cc.block_id) where cc.chain_id = ? and cc.in_longest = 1 I see, of course. Thanks a lot for your help. Hey, someone gave me a donation for a chart I made about coin age (pretty interesting: https://bitcointalk.org/index.php?topic=37333.msg459678#msg459678) using bitcoin-abe. Couldn't have done without, so I'm passing on part of it to you.
|
PGP key molecular F9B70769 fingerprint 9CDD C0D3 20F8 279F 6BE0 3F39 FC49 2362 F9B7 0769
|
|
|
jackjack
Legendary
Offline
Activity: 1176
Merit: 1280
May Bitcoin be touched by his Noodly Appendage
|
|
August 16, 2011, 05:27:23 PM |
|
That's incredible It took several days to read 124k blocks of the Bitcoin blockchain, but the version already read 128k blocks in less then 6 hours
|
Own address: 19QkqAza7BHFTuoz9N8UQkryP4E9jHo4N3 - Pywallet support: 1AQDfx22pKGgXnUZFL1e4UKos3QqvRzNh5 - Bitcointalk++ script support: 1Pxeccscj1ygseTdSV1qUqQCanp2B2NMM2 Pywallet: instructions. Encrypted wallet support, export/import keys/addresses, backup wallets, export/import CSV data from/into wallet, merge wallets, delete/import addresses and transactions, recover altcoins sent to bitcoin addresses, sign/verify messages and files with Bitcoin addresses, recover deleted wallets, etc.
|
|
|
Yanz
|
|
August 16, 2011, 06:46:01 PM |
|
Hi Yanz, This error UnicodeEncodeError: 'ascii' codec can't encode character u'\xf3' in position 0: ordinal not in range(128)
is fixed in the latest commit. Thanks! But I don't think it explains the "lockup". Are you sure Abe isn't just loading data? For SQLite the full block chain can take a week or longer. SQLite is okay for small, experimental block chains, though even there, you could wait hours. (Optimization of the initial data load should be a high priority enhancement. Currently, it's slow but simple, since it uses the same mechanism as the run-time new block import, which can not be optimized the way I have in mind.) Let me know how it goes. Thanks John! The error is fixed with MySQL. Yeah, I figured the "lockup" was just loading data because I ran it against a local sqlite database and it froze on me. So I switched to MySQL and now its fixed.
|
With great video cards comes great power consumption.
|
|
|
John Tobey (OP)
|
|
August 17, 2011, 01:36:08 AM |
|
Happy to see someone doing interesting things. Thanks for the donation.
|
|
|
|
newminerr
Member
Offline
Activity: 147
Merit: 11
The day to rise has come.
|
|
August 19, 2011, 12:58:39 PM |
|
I can't get this to run on mysql or even pgsql.
How do i pass the host,user,pass,db again?
|
|
|
|
John Tobey (OP)
|
|
August 19, 2011, 02:35:18 PM |
|
I can't get this to run on mysql or even pgsql.
How do i pass the host,user,pass,db again?
mysql> create database abe; Query OK, 1 row affected (0.03 sec) mysql> CREATE USER abe IDENTIFIED BY 'Bitcoin'; Query OK, 0 rows affected (0.00 sec) mysql> grant all on abe.* to abe; Query OK, 0 rows affected (0.00 sec) mysql> Then run: ./abe.py --dbtype MySQLdb --connect-args '{"user":"abe","host":"127.0.0.1","db":"abe","passwd":"Bitcoin"}' --port 2750 --upgrade Let me know how it goes.
|
|
|
|
newminerr
Member
Offline
Activity: 147
Merit: 11
The day to rise has come.
|
|
August 19, 2011, 03:15:14 PM |
|
Then run: ./abe.py --dbtype MySQLdb --connect-args '{"user":"abe","host":"127.0.0.1","db":"abe","passwd":"Bitcoin"}' --port 2750 --upgrade
Let me know how it goes.
I am running this on windows btw, could that be the problem? super(Connection, self).__init__(*args, **kwargs2) _mysql_exceptions.OperationalError: (2005, "Unknown MySQL server host ''{user:root,host:127.0.0.1,db:abe,passwd:password}'' (11004)")
|
|
|
|
|