Print Page - An optimal engine for UTXO db

Title: An optimal engine for UTXO db
Post by: piotr_n on March 04, 2016, 08:17:49 PM

No, I don't have one :)

I was just wondering, because I know the UTXO db is a very important component of the bitcoin system and it is obviously going to be somehow profitable in a future to have it well performing.

How would you imagine a database engine designed specifically for bitcoins UTXO db?

I think it's very important to be able to browse through all the records in a shortest possible time.

Title: Re: An optimal engine for UTXO db
Post by: 2112 on March 05, 2016, 04:45:33 AM

It really is a pointless question.

Quote from: piotr_n on March 04, 2016, 08:17:49 PM

I think it's very important to be able to browse through all the records in a shortest possible time.

For whom is the speed so important?

For the people who treat finances responsibly the transactional integrity will be most important specification, immediately followed by capability to integrate with existing financial and accounting middleware. Speed would be secondary or even tertiary, especially since the UTxO set is widely replicated and six confirmations take one hour on average.

Title: Re: An optimal engine for UTXO db
Post by: piotr_n on March 05, 2016, 08:35:27 AM

Quote from: 2112 on March 05, 2016, 04:45:33 AM

It really is a pointless question.

Quote from: piotr_n on March 04, 2016, 08:17:49 PM

I think it's very important to be able to browse through all the records in a shortest possible time.

You're mixing up things. Probably because you don't really understand what the UTXO database actually does inside the bitcoin software.

Spreed of accessing the records is important for obvious reasons.
The traditional reason is that you want to verify new transactions/blocks quickly and efficiently.
And the new reasons, like hardware wallets or any kind of stealth payments.

The current approach from the satoshi's client, where the wallet is glued to the node and keeps track of its addresses, updating their balance as new blocks appear - IMHO this solution has proven to be unreliable. Plus is totally unfeasible for private offline wallets.

An efficient UTXO database that can quickly browse through all the records is definitely a must for the next generation of bitcoin node software.
I am just not aware of any existing engines that would be ready to replace the currently used leveldb.
I think one eventually needs to be (and will be) created explicitly for Bitcoin. Just like sipa's secp256k1 lib was created explicitly for Bitcoin, as the originally used openssl's implementation wasn't good enough to handle a crucial part of the software.

Title: Re: An optimal engine for UTXO db
Post by: jl777 on March 05, 2016, 03:56:58 PM

for utxo, my approach with iguana creates a parallel set of datasets in read only memory mapped files. Arguably, this is the ultimate DB as it simply cant get any faster than this.

All things needed for block explorer level queries are precalculated and put into the read only files.

By having each bundle of 2000 blocks being mostly independent of all other blocks, this not only allows for parallel syncing of the blockchain but also using multiple cores for all queries regarading tx history.

I also calculate all address balances for each bundle, so by adding all the values across all the bundles you end up with the current balance as of the most recent bundle. Special case handling of the realtime bundle gets us up to date block explorer data for all blocks, tx, addresses (even multisig)

I dont quite have everything done, but the basic structure is showing very good results. I also remove as much redundancy as possible during the creation of the read-only files so instead of taking more space than the raw blockchain, it ends up taking about half the space. About half of that is the signatures, which can be purged for non-relaying nodes after it has been validated.

The format of the utxo vector would just be the location of the unspent that is being spent and the location of the spend is implicit in the order of the vins.

[u0, u1, u2, u3, ...] is a list of 6 byte locators for the unspent that the corresponding vin in the bundle is spending, where each ui contains the bundle id and the unspentind. Each unspentind is simply the order it appears in the bundle (starting at 1).

the above is not directly usable, but it is invariant and can be created once and put into the read-only file(system). using mksquashfs reduces the size of the index tables by almost 50%.

Given such a list for all bundles, then using a parallel multicore process, they can all be traversed and update a bitmap to indicate which outputs are spent. So a 0 bit for this means it is unspent. Maybe 1 bit of space per utxo is good enough, as that is seemingly as good as it gets, but it is 1 bit per vout, so more like 30MB size. However, I think this can be compressed pretty good with most of the early vouts already being spent (ignoring the satoshi ones). I havent had a chance to get an up to date utxo bitmap, but I hope to get one in the next week.

At the 30MB size or smaller, the entire utxo bitmap will fit into CPU L cache and provide 10x faster performance than RAM. RAM is actually quite slow compared to things that can be done totally inside the CPU.

James

Title: Re: An optimal engine for UTXO db
Post by: jl777 on March 05, 2016, 04:12:57 PM

Quote from: 2112 on March 05, 2016, 04:45:33 AM

It really is a pointless question.

Quote from: piotr_n on March 04, 2016, 08:17:49 PM

I think it's very important to be able to browse through all the records in a shortest possible time.

All users value speed. Otherwise we would still be using the slowest CPU that will eventually complete the tasks.

While speed and reliability can be a tradeoff decision, it is actually possible to get more reliability and speed at the same time. For example, if you are needing many subsystems to all work vs one, you can get a more reliable result that is based on just one thing, vs one that requires many parts.

Of course it is most likely that any such speedy and reliable system is more complicated to write, but once written it can be validated

James

Title: Re: An optimal engine for UTXO db
Post by: piotr_n on March 05, 2016, 07:47:23 PM

Quote from: jl777 on March 05, 2016, 03:56:58 PM

I'm sorry I don't have time to dig into your code, though I'm sure its a good stuff.
So how do you actually store the db in memory, in some simple words?

I use hashmaps and its pretty fast, but also very primitive.
Plus it takes loads of memory and ages to load from the disk.
And writing it into disk - separate problem.

I'm sure there must be so much space to improve it.

Can you say how many seconds you need to browse through all the current utxo records and calc, lets say, the sum of all the values.
With my engine I cannot get below 4 seconds anymore.

UTXO db is over 35 million records now.
It really needs a better shit than leveldb

Title: Re: An optimal engine for UTXO db
Post by: jl777 on March 05, 2016, 08:23:20 PM

Quote from: piotr_n on March 05, 2016, 07:47:23 PM

Quote from: jl777 on March 05, 2016, 03:56:58 PM

In its raw form with all the data scattered across the 200 bundles, a simplistic search for a specific address took 2.5 milliseconds on a 1.4Ghz i5. Not a parallel process, but serial. Using faster CPU and 8 cores, I think times of 100 nanoseconds could do a complete search and sum for a specific address.

But if the usecase is to keep all the totals updated, then after each bundle is completed, the thing to do is update the utxo as of that bundle boundary. That is a one time calculation of adding the spends starting from the previous bundle boundary.

What I am saying is to create write once data set into memory mapped files. So it is stored in the file. memory mapping it allows the CPU direct access to it.

The onetime update would iterate through the bundle's spends and update the utxo bitmap, but it would also update the total balances for each address. So at the end of each bundle, there would be the up to date total of all spends for each address.

Within each bundle is also a set of balances for all addresses that occur in that bundle. Again, with a onetime update, all the address balances can be updated with the changes from that bundle. This is not a mainstream case, so I dont do this automatically in iguana, but it wouldnt add too much time to the overall processing assuming these total balances are updated at the end of each bundle boundary.

It would be vin number of additions that occur in each bundle to get the total spends for addresses in that bundle and then one addition for each address to the global balances. Doing it this way, to get an up to date balance would be a matter of indexing into the global balance list and then adjusting it by the realtime bundle's vins and vouts.

In one implementation I updated all the balances as new tx came in, so it took no time to get a total summary. Checking that total by summing all the accounts took a few milliseconds, but sorting it into a rich list did end up taking a few seconds. It used up HDD space for storing all the balances, 16 bytes per address, so it does add up the space required if all addresses are needing to be updated. but a CPU can do tens of millions of adds quite quickly through RAM

James

Title: Re: An optimal engine for UTXO db
Post by: piotr_n on March 05, 2016, 08:29:17 PM

Quote from: jl777 on March 05, 2016, 08:23:20 PM

Yes, but I'm not interested in checking balance on addresses. I just want to know how fast it is to go trough each of the UTXO db records - like to do anything on each single one of them.

As for the rest, the memory mapping, storing it on disk and stuff, I think you're on to something.
But first I just want to know how fast your utxo db works comparing to mine. :)

Title: Re: An optimal engine for UTXO db
Post by: jl777 on March 05, 2016, 08:43:29 PM

Quote from: piotr_n on March 05, 2016, 08:29:17 PM

Quote from: jl777 on March 05, 2016, 08:23:20 PM

it took 2.5 milliseconds to do a listunspents for an address with thousands of tx on my laptop. It could have been any address (once I get all the bugs fixed)

If you wanted to scan all addresses, the totally unoptimized method would be taking a long time as it is designed for parallel searching specific address, but once the optimized dataset is there, then a rich list could be calculated in a few seconds.

Not sure what sort of things "like to do anything on each single one of them" you would want to do across all UTXO. To my thinking, other that total balance, the UTXO is not spanning across all addresses, so if you can extract all the relevant tx per address efficiently, that is sufficient.

Maybe I missed something that needs to rescan all utxo? With my setup importprivkey is not needed as all addresses are basically already there. If the millisecond time is too slow, then it can of course be dereferenced into 6 bytes per utxo.

All utxo are in a linked list inside each bundle and a hash table allows to find the end point directly. Additionally a bloom filter is added to each bundle to quickly determine if there is a need to search within the bundle.

If you just want all utxo to scan through, just iterate through the utxo bitmap, extract the utxo data and put it all into a single memory mapped file. But not sure it is worth spending a GB to do this as all the unspent data is already in memory mapped files, so 6 bytes per unspent is enough to make a list for each address's utxo. Then with a direct lookup of that address, you find all the utxo

struct iguana_unspent // 28 bytes
{
uint64_t value;
uint32_t txidind,pkind,prevunspentind,scriptoffset;
uint16_t hdrsi:14,type:4,vout:14;
} __attribute__((packed));

What I am saying is that there is no need for DB. All operations can be done essentially in RAM via memory mapped files, with a few exceptions for realtime address balances, which need's r/w memory

James

Title: Re: An optimal engine for UTXO db
Post by: piotr_n on March 05, 2016, 08:59:40 PM

Quote

Not sure what sort of things "like to do anything on each single one of them" you would want to do across all UTXO. To my thinking, other that total balance, the UTXO is not spanning across all addresses, so if you can extract all the relevant tx per address efficiently, that is sufficient.

ok, so let me ask you this way.

you have your utxo database, or so you call it.
you get a new transaction and you have to verify it.
how much does it take?

i hope you know that you need to fetch each tx's input from the db, verify its consistency and then put the changed records back to the db.

Title: Re: An optimal engine for UTXO db
Post by: jl777 on March 05, 2016, 09:25:01 PM

Quote from: piotr_n on March 05, 2016, 08:59:40 PM

Quote

FETCH:
For each vin, the txid/vout is mapped to the corresponding unspentind
I have a bitwise hashtable that is at most 50% full, so an openhash table works pretty well. I think the worst case is about 20 collisions, but each one is just a matter to increment to the next item.

once the txid is found, the vout tells us where the unspent is, as each txid has the firstvout as part of its data. at most 200, but on average scanning backwards finds the txid quicker than N/2 bundles for obvious reasons.

Using the listunspent as a comparison, that has to also traverse a linked list and it would need to do that for each bundle, so the updating utxo would be an order magnitude less work. Dont have benchmarks yet, but probably in the 100 microsecond range

VERIFY:
Once the unspentind is retrieved, the utxo bitmap is checked (instant)

PUT:
assuming all is well, the utxo bitmap bit is set to spent (instant)

On startup the utxo bitmap does need to be regenerated and updated from the last bundle boundary. I plan to make utxo dataset saves more frequently than 2000 block boundaries, maybe every 10, so only 10 blocks worth needs to be replayed to get to a current utxo set

James

Title: Re: An optimal engine for UTXO db
Post by: piotr_n on March 05, 2016, 09:28:08 PM

sounds like a good stuff.
I'll give it a better look once I get sober :)

so can you refer to some specific code that implements this?

Title: Re: An optimal engine for UTXO db
Post by: jl777 on March 05, 2016, 09:40:16 PM

Quote from: piotr_n on March 05, 2016, 09:28:08 PM

sounds like a good stuff.
I'll give it a better look once I get sober :)

so can you refer to some specific code that implements this?

github.com/jl777/SuperNET

in the SuperNET/iguana dir the iguana_*.c files have this, mostly in iguana_ramchain.c. No comments to speak of and I like to pack as much into each line as possible, so it is 3x to 5x denser than most the C code out there.

I will properly document everything once it is all finalized.

As you might imagine, it is quite tricky to get all the bits in the right place, so validating the dataset is my current priority. I added ability for it to act as a lossless codec for the rawtransactions, which would allow a brute force comparison with the reference bitcoind.

Once that is confirmed, then the next step is to validate the block explorer level data, but I am not familiar enough with any to generate the reference data. Hopefully someone around here is.

Once all the balances for all the addresses are verified at all the bundle boundaries, then "all" that is left is the realtime bundle being validated. The brute force way would just be to generate a bundle without all the blocks, as each new block come in. I guess it wont be so bad as both the bloom filter and the open hashtable was designed to be able to be incrementally added to.

James

Title: Re: An optimal engine for UTXO db
Post by: piotr_n on March 05, 2016, 09:46:40 PM

Quote from: jl777 on March 05, 2016, 09:40:16 PM

Once all the balances for all the addresses are verified at all the bundle boundaries, then "all" that is left is the realtime bundle being validated. The brute force way would just be to generate a bundle without all the blocks, as each new block come in. I guess it wont be so bad as both the bloom filter and the open hashtable was designed to be able to be incrementally added to.

Although I'm not quite sure what you mean by "brute force" among other things, indeed the way to go is to keep just a blockchain secured snapshot of the utxo db at a certain block.
Its the only way to go.
Everybody knows that, but nobody does anything about it :)

Title: Re: An optimal engine for UTXO db
Post by: jl777 on March 05, 2016, 09:56:02 PM

Quote from: piotr_n on March 05, 2016, 09:46:40 PM

Quote from: jl777 on March 05, 2016, 09:40:16 PM

Although I'm not quite sure what you mean by "brute force", indeed the way to go is to keep just a blockchain secured snapshot ot the utxo db.
Its the only way to go.
Everybody knows that, but nobody does anything about it :)

I have sha256 hashes of all the internal datasets, so it is possible to quickly find out where any mismatch is between two datasets (though it is contaminated with endian dependence at this time). How to know who put the utxo hash into the blockchain? If there is a way to do this without it being vulnerable to attacker posting his version, then it would be great to have

by bruteforce, I mean iterating through each block, tx, address and compare results to a known good data source.

Once it confirms at a specific bundle boundary, then it is all wrapped into a read-only filesystem, which by their nature are pretty tamper proof and also adds a standard lz compression/decompression layer transparently

This means the only things that would ever need to be recalculated are the most recent bundle (max 2000 blocks) and even that is possible to be reduced with partial bundles saved as the realtime bundle grows. It is a tradeoff between startup calculation time and runtime backups

James

Title: Re: An optimal engine for UTXO db
Post by: watashi-kokoto on March 05, 2016, 10:38:09 PM

Quote from: piotr_n on March 04, 2016, 08:17:49 PM

I think it's very important to be able to browse through all the records in a shortest possible time.

I disagree, the major requirements

verify utxo spend
verify utxo amount
remove used txo
add new utxos from block
reorganize revert utxo

Title: Re: An optimal engine for UTXO db
Post by: jl777 on March 05, 2016, 10:51:52 PM

Quote from: watashi-kokoto on March 05, 2016, 10:38:09 PM

Quote from: piotr_n on March 04, 2016, 08:17:49 PM

I think it's very important to be able to browse through all the records in a shortest possible time.

I disagree, the major requirements

verify utxo spend
verify utxo amount
remove used txo
add new utxos from block
reorganize revert utxo

yes, this was my thinking with my design. most of the operations above are very fast.
adding utxo from new block does require updating all the various datasets, but all in all not so much time

the reorg revert utxo is the only slow one, but I delay finalizing the bundle until 10 blocks past, so the odds for reorg to invalidate bundle are pretty small. If it happens, several minutes to regenerate the bundle file.

the realtime bundle should be small enough so any reorg can simply recalculate from the max reorg depth

Title: Re: An optimal engine for UTXO db
Post by: 2112 on March 06, 2016, 01:35:51 AM

Quote from: piotr_n on March 05, 2016, 08:35:27 AM

I had this discussion about 3 years ago with etotheipi, the ex-CEO of ex-Armory:

New blockchain management in Armory

https://bitcointalk.org/index.php?topic=144015.0

Now we are sure that Armory was a total loss as a business: they delivered promised software but achieved exactly zero sales, not even a consulting contract for the ex-employees.

I'm not going to repeat the arguments from 3 years ago, interested readers could study the Armory trainwreck.

AFAIK Armory sometime in 2015 also understood that LevelDB is not really suitable and attempted to change the underlying engine to LightningDB:

https://en.wikipedia.org/wiki/Lightning_Memory-Mapped_Database

Notwithstanding the above I agree with you that distributed cyptocurrencies both pose unique requirements and provide unique opportunities for optimization. In particular the traditional database ACID is an overkill for UTxO, something much simpler will work due to extraordinary replication that is fundamental to the concept of distributed cryptocurrencies.

I always like to collect the losers who deny the need for transactional consistency is critical. GLBSE and ASICMINER were the two most well known occurrences of double payment. Couple of days ago BitBet joined the club, this time blaming it on miners. I'm going to quote the whole Qntra article for my future reference here:

Quote from: http://qntra.net/2016/03/a-miner-problem/

A Miner Problem

Posted on March 2, 2016 by Mircea Popescu

As announced in #bitcoin-assets, BitBet was attacked earlier today through a transaction withholding mechanism. The attack unfolded as follows :

1. A BitBet bet (Jeb Bush will be Republicans' 2016 Presidential Nominee) was closed and resolved on 21 February. This created a list of winners with a reasonable expectation to be paid their winnings.

2. A first transaction was broadcast, to satisfy the claims of the winners, spending some inputs, and offering a fee of 0. This transaction was, as you'd expect, neither mined nor included in the mempools of most Bitcoin nodes.

3. A second transaction was broadcast, spending the same inputs as A1, including a fee of 0.0001, call it A2. For a ~1.5kb transaction, this fee is on the low side, so it perhaps could be argued that it not being mined would be expected. Nevertheless, transaction A2 was also not included in the mempools of most Bitcoin nodes.

4. As neither transaction A1 or A2 were mined after 54 (48) hours, a further transaction was broadcast, spending the same inputs as A1 and A2, and including a fee of 0.000175, call it A3. By any measure, a fee in excess of 10 satoshi per byte should be sufficient to have transactions mined. Nevertheless, contrary to expectation, transaction A3 was not included in either a block or the mempools of most Bitcoin nodes.

5. After a further 48 hours, a fourth transaction was broadcast, spending the same inputs as A1, A2 and A3, and including a fee of 0.00022, call it A4. Just like the previous three, transaction A4 was not either included in a block or advertised by most Bitcoin nodes.

6. After a further 16 hours, transaction B was broadcast, that included the same outputs as transactions A1-A4, but different inputs. Transaction B, like transaction A4, included a fee of 0.00022. Transaction B was advertised by most Bitcoin nodes immediately thereafter, and was included in a block within half hour of being broadcast.

7. Two hours after B was included in a block, transaction A1 was re-broadcast by an unknown third party. Twenty minutes later, the 0 fee, week old, not-advertised-by-anyone-ever transaction was included in a block.

On the basis of these events, the following allegations can readily be supported :

• That the notion of "a majority of Bitcoin nodes" is void of content, most of the relay network being under the control of the same entity and supporting functionality not contemplated by the Bitcoin protocol (such as selective mothballing of specific transactions).(1) This specifically means that "someone"(2) has the ability to nuke transactions out of the network irrespective of whether they are validly signed or not, directly replicating functionality already available to fiat governments in fiat payment processors such as Visa or Paypal.

• That a cartel of Bitcoin miners is deliberately and systematically withholding blocks for an interval of about 20 minutes to a half hour, so as to ensure themselves a (significant) advantage over any would-be competitors.(3)

• That neither above item makes sense or could long survive without the other, meaning that necessarily the culprit is one and the same. Note for the record that an entity controlling 51% of the network (as is by a large margin the case with the Antpool/F2pool cartel) can safely withhold blocks for an indefinite period – if a competitor publishes a longer chain all the cartel has to do is keep mining on its own, eventually they'll prevail.(4)

Note that there are no alternative theories that more parsimoniously explain the chain of observed phenomena, and consequently the validity of the above allegations is a matter of objective truth. One may disagree on the strength of subjective feeling, or put forth his own interpretations, but these – much as in the case of alternative explanations for biological evolution – suffer from a fundamental logical weakness.

Update March 2, 2016 18:49 UTC: In a further development on this story, Bitbet has now announced a moratorium on payments until the issue is resolved. (N. ed.)

1. You should remember that the Bitcoin relay network has for many years been in very poor shape – there were as little as 60 nodes active last year around this time. Since then the situation has only deteriorated – the jury is still out as to whether BlueMatt's doohicky actually helped or hindered while it was functional, but there's no argument that the network degraded YoY.

2. Who, according to your own preference, may be the NSA or the Antpool/F2pool cartel.

3. The A1 transaction wasn't broadcast and then included in a block – at the time it was broadcast, the block containing it had already been mined – it simply hadn't yet been shared with the rest of the plebs is all.

4. 51% means that out of a day's 144 blocks, one has 74, and consequently gains a 3 block advantage over a competitor chain every single day..It is true that the cartel does not generally wish to advertise its presence (yet), and so to date they've avoided major reorgs. Bear in mind however that it is also the case that the remainder minority is not united but fragmented – so they don't actually need to resort to major reorgs.

Edit: formatting fixes

Title: Re: An optimal engine for UTXO db
Post by: jl777 on March 06, 2016, 04:53:28 AM

Quote from: 2112 on March 06, 2016, 01:35:51 AM

Quote from: piotr_n on March 05, 2016, 08:35:27 AM

Quote from: http://qntra.net/2016/03/a-miner-problem/

Edit: formatting fixes

regardless of what "DB" is used, of course transactional consistency is critical. Not sure how you got the impression that this was not the case.

Now in the case the network does a deep reorg, then regardless of what DB or non-DB is used, things will be out of sync, until the deep reorg propagates. As long as the non-DB solution generates the same results as a DB solution, then whatever the external issues are, are external to the local storage methods

James

Title: Re: An optimal engine for UTXO db
Post by: amaclin on March 06, 2016, 06:27:11 PM

http://tarantool.org/

Title: Re: An optimal engine for UTXO db
Post by: jl777 on March 06, 2016, 08:22:06 PM

Quote from: amaclin on March 06, 2016, 06:27:11 PM

http://tarantool.org/

Looks pretty interesting. I was looking for a lightweight DB to use for storing the privkeys for the wallet.

Thanks!

from the custom RAM data structures I have achieve millions of operations per second, but it all depends on the specifics as to what performance level is possible. Getting the entire dataset to fit inside the CPU's Lx cache is what gets things going really, really fast. Once things need to "swap" out to the DRAM, it slows down sharply. And if it has to hit the HDD... even if it is SSD, it is an order of magnitude or more, slower.

But for non-performance critical things like tracking wallet privkeys, these performance issues are not really an issue. (unless you are satoshi and have thousands and thousands of privkeys)

James

Title: Re: An optimal engine for UTXO db
Post by: AlexGR on March 07, 2016, 07:17:48 PM

There are some intel chips with 128MB L4 cache (shared cpu/gpu). Obviously the further "away" it is from the cores, the slower it will be, but still you'd expect it to be at least twice as fast as the ram.

Title: Re: An optimal engine for UTXO db
Post by: jl777 on March 07, 2016, 08:55:27 PM

Quote from: AlexGR on March 07, 2016, 07:17:48 PM

http://www.extremetech.com/extreme/188776-how-l1-and-l2-cpu-caches-work-and-why-theyre-an-essential-part-of-modern-chips

https://forums.aida64.com/topic/2864-i7-5775c-l4-cache-performance/

The 10x to 40x speed improvements creates very nonlinear performance boosts when the dataset is small enough

James

Title: Re: An optimal engine for UTXO db
Post by: piotr_n on March 08, 2016, 04:47:31 PM

Quote from: amaclin on March 06, 2016, 06:27:11 PM

http://tarantool.org/

this looks like a right way to go, but..

1) I failed to install the software on my linux, following the docs. Or to build the sources....

2) I don't think TCP based API is suitable for the needs of a DB we're discussing here

how do you imagine browsing trough a content of the current 35M+ UTXO records whit this TCP based API?

Title: Re: An optimal engine for UTXO db
Post by: jl777 on March 08, 2016, 05:29:21 PM

Quote from: piotr_n on March 08, 2016, 04:47:31 PM

Quote from: amaclin on March 06, 2016, 06:27:11 PM

http://tarantool.org/

If you just want an embeddable DB, http://sophia.systems/ is pretty good. I see they released a version 2 with hybrid in-memory, which is good as my feedback to them was having everything HDD made it too slow.

The sophia 1.2 was 10x smaller than BDB and much faster, so 2.0 version should be that much better.

but nothing would have the performance/footprint profile of the read-only memory mapped bundles. I just cant double the size of the code just to add a DB, especially as I could just use a deterministic reverse hash chain for a wallet's keypairs. Loss of the wallet passphrase would compromise the privkeys, but it seems the same result as storing independent privkeys in a DB encrypted with the same passphrase.

Making slow but steady progress with the debugging and documenting it here: https://bitco.in/forum/threads/debugging-status.950/

James

Title: Re: An optimal engine for UTXO db
Post by: piotr_n on March 08, 2016, 05:56:32 PM

James, can you please describe how you actually implemented your db?
I'm especially interested in the index part, that (according to you) fits into the CPU's cache.

I've been trying to read through your code, but its shit loads and I don't have that much patience.
I once implemented sipa's secp256k functions in Go, so I do have some patience... but it has its limits :)

Say, I'd like to implement the read-only database you made, in another language - what does it take?

Title: Re: An optimal engine for UTXO db
Post by: jl777 on March 08, 2016, 06:13:25 PM

Quote from: piotr_n on March 08, 2016, 05:56:32 PM

You would need a high end CPU to fit it all in L caches and not all the "DB" would fit into CPU cache, unless you partitioned queries for the different datasets to different cores and strongly influenced the content of each CPU's cache

iguana_ramchain.c has most of what is needed, but right now it is a construction zone as I just change the data structures to support lossless encoding/decoding and quite tricky doing that with all the constraints.

Assuming you have already the blockchain internally, it might be easier to just directly generate the bundle files. The key is the encoding and if interop is not a constraint, then you can just use a similar model without making it exactly compatible. Of course, having an independent implementation would be a fantastic cross check, so if you are up for that we can work out a strict memory layout. Then you will know you made things bugfree if it makes the identical file and debugging is quite the chore without this...

I am starting to describe in some detail the internal data formats: https://bitco.in/forum/forums/iguana.23/

The codebase is highly tuned and optimized to do the parallel sync and due to this, I have each peer's thread create the first pass file(s). Then helper threads aggregate a bundle and generate the bundle file. the ramchain_iterate handles both the raw and expanded forms, so maybe a port of iguana_ramchain.c is a good way to port it.

The functions ending with PT indicate it is running in a peer's thread and not all functions are needed. the bundlesaveHT function runs from a helper thread and it is the one doing the combining

James

Title: Re: An optimal engine for UTXO db
Post by: piotr_n on March 08, 2016, 07:44:44 PM

Quote from: jl777 on March 08, 2016, 06:13:25 PM

Assuming you have already the blockchain internally, it might be easier to just directly generate the bundle files. The key is the encoding and if interop is not a constraint, then you can just use a similar model without making it exactly compatible.

Sounds good, but can you express it in a way that I could understand, what does it actually mean to do for the software?

Title: Re: An optimal engine for UTXO db
Post by: jl777 on March 08, 2016, 07:51:40 PM

Quote from: piotr_n on March 08, 2016, 07:44:44 PM

Quote from: jl777 on March 08, 2016, 06:13:25 PM

Sounds good, but can you express it in a way that I could understand, what does it actually mean to do for the software?

to encode the tx data in the following format:

struct iguana_txid { bits256 txid; uint32_t txidind,firstvout,firstvin,locktime,version,timestamp,extraoffset; uint16_t numvouts,numvins; } __attribute__((packed));

struct iguana_unspent { uint64_t value; uint32_t txidind,pkind,prevunspentind,scriptoffset; uint16_t hdrsi:14,type:4,vout:14; } __attribute__((packed));

struct iguana_spend { uint32_t spendtxidind,scriptoffset:31,coinbase:1; int16_t prevout; uint16_t numsigs:4,numpubkeys:4,p2sh:1,sighash:4,external:1,sequenceid:2; } __attribute__((packed));

struct iguana_pkhash { uint8_t rmd160[20]; uint32_t pkind,pubkeyoffset; } __attribute__((packed));

I describe this in the iguana child board in a lot more detail

Since I am not you, I do not know what expression requirements exist to allow proper parsing and semantic understanding by you.

Maybe you can ask the specific questions after reading the iguana threads?

James

Title: Re: An optimal engine for UTXO db
Post by: piotr_n on March 08, 2016, 07:54:56 PM

this is my preferred expression.
thank you.

i think its nice and I will look into it more.

Title: Re: An optimal engine for UTXO db
Post by: piotr_n on March 08, 2016, 08:18:36 PM

so say, @james you only have the blocks data - from 1 to 400000. stored close, on ssd disk or whatever. say ramdisk :)
how long does it "load" for you to have all the database content of the block 400000?

Title: Re: An optimal engine for UTXO db
Post by: jl777 on March 08, 2016, 08:52:26 PM

Quote from: piotr_n on March 08, 2016, 08:18:36 PM

do you mean reprocessing all the raw data from scratch? With 8 cores that would take 30 minutes or less depending on if you validate all sigs from block0 or not, maybe a bit longer.

If you have all the processed bundle files, then in the time it takes to memory map 200 files, the data set is accessible to the CPU. So that would be up to the OS/HDD, but I am seeing it map all the files in a few seconds on a lowend laptop and less than a second on a VPS server.

Once the bundle files are mapped, then a block explorer (not blockchain) level data is available for all addresses, blocks, tx

James

Title: Re: An optimal engine for UTXO db
Post by: piotr_n on March 08, 2016, 09:00:54 PM

thanks. i like it.

Title: Re: An optimal engine for UTXO db
Post by: jl777 on March 10, 2016, 11:55:34 AM

Quote from: piotr_n on March 08, 2016, 09:00:54 PM

thanks. i like it.

if you want really fast speed, you can make a CUDA/OpenSSL version and use a dedicated core per bundle, among other optimizations.

With most all the data being read only, the biggest challenge of GPU coding (out of sync data between cores) is already solved.

It might take a few GPU cards to store all the dataset, but then all RPC to scan the entire blockchain happens in milliseconds

using the CPU for the latest bundle can create a system that keeps up with new blocks.

Who said there was a scaling problem with bitcoin?

James

Title: Re: An optimal engine for UTXO db
Post by: amaclin on March 10, 2016, 01:42:13 PM

Quote from: jl777 on March 10, 2016, 11:55:34 AM

Who said there was a scaling problem with bitcoin?

me. and I am ready to repeat it again

Title: Re: An optimal engine for UTXO db
Post by: jl777 on March 10, 2016, 05:36:02 PM

Quote from: amaclin on March 10, 2016, 01:42:13 PM

Quote from: jl777 on March 10, 2016, 11:55:34 AM

Who said there was a scaling problem with bitcoin?

me. and I am ready to repeat it again

please be specific

What part of bitcoin makes it not able to scale?

I have solved the "takes too long to sync" issue. I think with interleaving, a much greater tx capacity is possible. Using BTC as the universal clock for all the other cryptos will let smaller chains deal with lower value transactions. And now a way to use GPU for blockchain operations.

James

Title: Re: An optimal engine for UTXO db
Post by: 2112 on March 13, 2016, 04:37:05 AM

Quote from: jl777 on March 05, 2016, 04:12:57 PM

Otherwise we would still be using the slowest CPU that will eventually complete the tasks.

This is a self-refuting argument.

Hardware gets faster therefore people are willing to put up with slower software.

Do not confuse hardware speed and software speed when discussing speed of the whole system.

Edit: A good example of clear thinking from somebody who isn't professionally doing software development:

Quote from: watashi-kokoto on March 05, 2016, 10:38:09 PM

Quote from: piotr_n on March 04, 2016, 08:17:49 PM

I think it's very important to be able to browse through all the records in a shortest possible time.

I disagree, the major requirements

verify utxo spend
verify utxo amount
remove used txo
add new utxos from block
reorganize revert utxo

Title: Re: An optimal engine for UTXO db
Post by: jl777 on March 13, 2016, 04:59:41 AM

Quote from: 2112 on March 13, 2016, 04:37:05 AM

Quote from: jl777 on March 05, 2016, 04:12:57 PM

Otherwise we would still be using the slowest CPU that will eventually complete the tasks.

Quote from: watashi-kokoto on March 05, 2016, 10:38:09 PM

Quote from: piotr_n on March 04, 2016, 08:17:49 PM

I think it's very important to be able to browse through all the records in a shortest possible time.

I disagree, the major requirements

verify utxo spend
verify utxo amount
remove used txo
add new utxos from block
reorganize revert utxo

If you think I am ignoring overall system speed, then you are not really reading my posts.
I also like kokoto's requirements and my dataset is designed to get good overall system performance for those, and also to be able to do blockexplorer level queries from the same dataset for lower level blockchain queries.

Not sure what your point is? Maybe you can try to run a DB system written in JS? I dont know of any that is anywhere fast enough.

It is not a choice between DB vs no DB, but rather no-DB vs nothing, as my requirement is for a system that works from a chrome app

James

Title: Re: An optimal engine for UTXO db
Post by: 2112 on March 13, 2016, 05:13:25 AM

Quote from: jl777 on March 13, 2016, 04:59:41 AM

You are making 2 unrelated mistakes:

1) mixing hardware speed with software speed. On fast hardware it is OK to have slow software.

2) your tests are only including sync speed (both initial and re-sync after getting temporary offline). You don't test continuous operation nor chain reorganization.

This has nothing to do with using or not using a DBMS. It is a more fundamental question what your system will be capable of when finished.

Title: Re: An optimal engine for UTXO db
Post by: jl777 on March 13, 2016, 05:21:23 AM

Quote from: 2112 on March 13, 2016, 05:13:25 AM

Quote from: jl777 on March 13, 2016, 04:59:41 AM

1) of course on fast hardware it is ok to have slow software. but slow software on slow hardware is not acceptable. And if you have to write fast software for the slow system, there is no point to write slow software is there?

2) initial sync and re-sync are very important. continuous operation will be running out of RAM directly, combining data from the readonly dataset. And it is not so easy to test things before they are running. I am assuming that chain reorgs wont go past 100 blocks very often. In the event it does, you would have a few minutes of downtime to regenerate the most recent block.

If it has nothing to do with DBMS, then why did you keep doing the database class thing?

As far as performance of iguana when it is complete, it will be fast for things that typically are very slow, ie. importprivkey.

Title: Re: An optimal engine for UTXO db
Post by: hhanh00 on March 13, 2016, 06:05:26 AM

To answer the OP, it is difficult to decide what makes an optimal engine for the UTXO set because as usual one has to make design trade offs.

1. If you are interested in super fast import, i.e. processing the block chain in shortest time, you are dealing with a large amount of insert/delete of key value pairs where the key contains a crypto hash. They are essentially random values and will fall all over the place. However, the net amount fits in an average development machine so you can work entirely in RAM. However, bitcoin core can't assume that you have several GB of RAM available!

Here, the main problem of leveldb is that its engine uses skip lists. Their average search complexity is ln(N). You'd be better off with a hash table which is O(1). For the record, skip lists and B-Trees are used when we need to retrieve the keys in order and it's not the case here.

Because blocks depend on previous blocks, processing them in parallel isn't easy. I'm not sure it helps even. (Block validation excluded)

2. If you are building a browser, then you can precalculate the hell out of your data set. The usual database tricks apply: lineralize the key space, compress values, use bloom filters, partition your data, etc. You are working with data than changes only every 10 mn, so very quite static. If you need the unconfirmed tx, add them on top. This case isn't what a db is really used for anyway. A lot of effort is put into concurrency and durability. When they are not crucial, faster implementations exist - especially if you can choose the hardware.

3. If it's for checking transactions in a full node, there are relatively few of them. Once you have extracted the UTXO, we are talking about a few million records which is peanuts in today's standards.

In conclusion, leveldb is fine for its usage it bitcoin core. The choice of a skip list vs hash table engine is questionable, but I don't think that makes a big difference for most of the users.

Maybe, you could describe your usage?

Title: Re: An optimal engine for UTXO db
Post by: 2112 on March 13, 2016, 06:27:54 AM

Quote from: jl777 on March 13, 2016, 05:21:23 AM

If it has nothing to do with DBMS, then why did you keep doing the database class thing?

Because the title of this thread is about DBMS a.k.a. DB engines.

Because of the fact that the data stored is actually about money.

You just have zero experience dealing with financial data processing, and actually less-than-zero as far as legally-responsible fiduciary-duty-bound handling of money for other people.

Therefore what you are doing are just educational toys. I have nothing against you designing and playing with those toys. But if you thinking that you are making something that will become a real financial product you set yourself for a CIYAM-level disappointment.

https://bitcointalk.org/index.php?topic=1375689.msg13995422#msg13995422

Quote from: CIYAM

The CIYAM project has got a lot of criticism about being extremely hard to understand and I'd like to explain exactly why this is.

Initially the project was a closed source one originating from Australia (where I grew up) but once I moved to China in late 2006 I decided that due to the problems of enforcing copyright in China (i.e. no chance) that I would need to make the software "hard to understand" (a sort of security by obscurity approach).

So when I created the initial open source version of CIYAM it inherited years of such development that I had worked on in China (around 6 years).

I no longer actually support the ideas of patent and copyright so I actually don't care too much now about that - what I worked out how to do was to just make a software system so difficult to understand that those things are irrelevant (no-one has even bothered to try and fork my project and modify it).

Bitcoin Forum

Bitcoin => Development & Technical Discussion => Topic started by: piotr_n on March 04, 2016, 08:17:49 PM