wow, that's a LOT of changes... practically speaking, will segwit tx work for sending to an old wallet or do both sides need to run it for it to be spendable. it seems that would be the case. if so, doesnt that create a lot of problems along the lines of "i sent you this txid, but you need this wtxid to be able to spend it, oh and the new updated wallet that supports segwit that isnt available yet from your vendor"
|
|
|
I cant find the changes needed to be made to support segwit.
It must change the protocol and blockchain format, so I would imagine there is some obvious place I overlooked where to find it.
James
|
|
|
If you think I am ignoring overall system speed, then you are not really reading my posts. I also like kokoto's requirements and my dataset is designed to get good overall system performance for those, and also to be able to do blockexplorer level queries from the same dataset for lower level blockchain queries.
Not sure what your point is? Maybe you can try to run a DB system written in JS? I dont know of any that is anywhere fast enough.
It is not a choice between DB vs no DB, but rather no-DB vs nothing, as my requirement is for a system that works from a chrome app
James
You are making 2 unrelated mistakes: 1) mixing hardware speed with software speed. On fast hardware it is OK to have slow software. 2) your tests are only including sync speed (both initial and re-sync after getting temporary offline). You don't test continuous operation nor chain reorganization. This has nothing to do with using or not using a DBMS. It is a more fundamental question what your system will be capable of when finished. 1) of course on fast hardware it is ok to have slow software. but slow software on slow hardware is not acceptable. And if you have to write fast software for the slow system, there is no point to write slow software is there? 2) initial sync and re-sync are very important. continuous operation will be running out of RAM directly, combining data from the readonly dataset. And it is not so easy to test things before they are running. I am assuming that chain reorgs wont go past 100 blocks very often. In the event it does, you would have a few minutes of downtime to regenerate the most recent block. If it has nothing to do with DBMS, then why did you keep doing the database class thing? As far as performance of iguana when it is complete, it will be fast for things that typically are very slow, ie. importprivkey.
|
|
|
Otherwise we would still be using the slowest CPU that will eventually complete the tasks.
This is a self-refuting argument. Hardware gets faster therefore people are willing to put up with slower software. Do not confuse hardware speed and software speed when discussing speed of the whole system. Edit: A good example of clear thinking from somebody who isn't professionally doing software development: I think it's very important to be able to browse through all the records in a shortest possible time.
I disagree, the major requirements verify utxo spend verify utxo amount remove used txo add new utxos from block reorganize revert utxo If you think I am ignoring overall system speed, then you are not really reading my posts. I also like kokoto's requirements and my dataset is designed to get good overall system performance for those, and also to be able to do blockexplorer level queries from the same dataset for lower level blockchain queries. Not sure what your point is? Maybe you can try to run a DB system written in JS? I dont know of any that is anywhere fast enough. It is not a choice between DB vs no DB, but rather no-DB vs nothing, as my requirement is for a system that works from a chrome app James
|
|
|
LevelDb is used to store the UTXO set. How is that read only? UTXO set falls into the write once category. Once an input is spent, you cant spend it again. The difference with the UTXO set is explained here: https://bitco.in/forum/threads/30mb-utxo-bitmap-uncompressed.941/So you can calculate the OR'able bitmap for each bundle in parallel (as soon as all its prior bundles are there). Then to create the current utxo set, OR the bitmaps together. What will practically remain volatile is the bitmap, but the overlay bitmap for each bundle is read only. This makes a UTXO check a matter to find the index of the vout and check a bit. James Not sure if we are talking about the same thing. Following your link, it seems you are describing the internal data structure used by a block explorer which aren't necessarily optimal for a bitcoin node. In particular, you use a 6 byte locator. Given a new incoming transaction that can spend any utxo (hash+vout), do you need to map it to a locator? And if so, how is it done? iguana is a bitcoin node that happens to update block explorer level dataset. The data structures are optimized for parallel access, so multicore searches can be used however even with a single core searching linearly (backwards), it is quite fast to find any txid Each bundle of 2000 files has a hardcoded hash table for all the txid's in it, so it is a matter of doing a hash lookup until it is found. I dont have timings of fully processing a full block yet, but I dont expect it would take more than a few seconds to update all vins and vouts since txid's are already high entropy, there is no need to do an additional hash, so I XOR all 4 64-bit long ints of the txid together to create an index into an open hash table, which is created to be never more than half full, so it will find any match in very few iterations. Since everything is memory mapped, after the initial access to swap it in, each search will take less than a microsecond
|
|
|
This is an very informal proof, because I wanted it to be as readable as possible for the majority of readers. I hope this will finally show why Proof of Stake (PoS) is not a viable consensus design.
Ok, now please provide a formal proof for minority of readers who can't understand an informal one (e.g. me). @kushti i think the logic used in this thread is that given that we assume A inevitably leads to B, since A is self-evident, then B is too. It is hard to argue with that sort of logic as it allows to prove conclusively that B is true, it doesnt matter what B is, just as long as A is self-evident. Like this: We will assume that above absolute zero temperatures it is inevitable that the moon is made of cheese. Since we are not all frozen at absolute zero, it is clear that the moon is made of cheese. I think formally it would be: Assume A -> B and A is true, therefore B is true James Well then the burden is to prove A. Why is it assumed "self evident"? Because it is in the OP, so it has to be true
|
|
|
This is an very informal proof, because I wanted it to be as readable as possible for the majority of readers. I hope this will finally show why Proof of Stake (PoS) is not a viable consensus design.
Ok, now please provide a formal proof for minority of readers who can't understand an informal one (e.g. me). @kushti i think the logic used in this thread is that given that we assume A inevitably leads to B, since A is self-evident, then B is too. It is hard to argue with that sort of logic as it allows to prove conclusively that B is true, it doesnt matter what B is, just as long as A is self-evident. Like this: We will assume that above absolute zero temperatures it is inevitable that the moon is made of cheese. Since we are not all frozen at absolute zero, it is clear that the moon is made of cheese. I think formally it would be: Assume A -> B and A is true, therefore B is true James
|
|
|
LevelDb is used to store the UTXO set. How is that read only? UTXO set falls into the write once category. Once an input is spent, you cant spend it again. The difference with the UTXO set is explained here: https://bitco.in/forum/threads/30mb-utxo-bitmap-uncompressed.941/So you can calculate the OR'able bitmap for each bundle in parallel (as soon as all its prior bundles are there). Then to create the current utxo set, OR the bitmaps together. What will practically remain volatile is the bitmap, but the overlay bitmap for each bundle is read only. This makes a UTXO check a matter to find the index of the vout and check a bit. James
|
|
|
but yes, use hashes for non-permanent data, index for permanent
I thought of a possible issue with offline signing. If you want to offline sign a transaction, then you need to include proof that the index refers to a particular transaction output. Armory already has to include all the inputs into the transaction in order to figure out how much you are spending, so it isn't that big a deal. The problem is that there is no way to prove that a particular index refers to a particular output. You would need to include the header chain and that is only SPV safe. For offline transactions, using the hash is probably safer, but that doesn't affect internal storage, just if transactions could refer to previous outputs by index. Maybe I didnt make it clear in this thread. The indexes are used for traversing lists, etc., but all the txids are in the dataset. So the dataset has indexes referring to other indexes (or implicitly by their position) and each index can be converted to its fully expanded form. Tastes great and less calories. So you can do all the verifications, etc. as it is a local DB/rawfile replacement combined into one. Being read-only for the most part mksquashfs can create a halfsized dataset that also acts to protect it from data corruption. I split out the sigs into separate files, so they can be purged after being validated. I am also thinking of doing the same with pubkeys, so nodes that want to be able to search all pubkeys seen could, but nodes that dont wont need to keep it around. It works as a lossless codec that stores its data in a way that is ready to do searches without any setup time (for the data it already processed). All addresses are fully indexed, even multisig/p2sh so there is no need for importprivkey. Listunspent serially took ~2 milliseconds on a slow laptop. With the parallel datasets, it is well suited for multi-core searching to allow for linear speedups based on number of cores. even GPU could be used for industrial strength speed as long as there is a pipeline of requests to avoid the latency issues of using GPU per RPC James
|
|
|
why use a DB for an invariant dataset?
That sounds like an exam question for DBMS 101 course. 1) independence of logical data from physical storage 2) sharing of the dataset between tasks and machines 3) rapid integrity verification 4) optimization of storage method to match the access patterns 5) maintenance of transactional integrity with related datasets and processes 6) fractional/incremental backup/restore while accessing software is online 7) support for ad-hoc queries without the need to write software ease of integration with new or unrelated software packages 9) compliance with accounting and auditing standards 10) easier gathering of statistics about access patterns I think those 10 answers would be good enough for A or A-, maybe B+ in a really demanding course/school. OK, you can get an A However, memory mapped files share a lot of the same advantages you list: 1) independence of logical data from physical storage - yes 2) sharing of the dataset between tasks and machines - yes (you do need both endian forms) 3) rapid integrity verification - yes, even faster as once verified, no need to verify again 4) optimization of storage method to match the access patterns - yes that is exactly what has been done 5) maintenance of transactional integrity with related datasets and processes - yes 6) fractional/incremental backup/restore while accessing software is online - yes 7) support for ad-hoc queries without the need to write software - no ease of integration with new or unrelated software packages - no 9) compliance with accounting and auditing standards - not sure 10) easier gathering of statistics about access patterns - not without custom code So it depends on if 7 to 10 trump the benefits and if the resources are available to get it working James
|
|
|
Bitcoin Core does not store the blockchain in a database (or leveldb) and never has. The blockchain is stored in pre-allocated append only files on the disk as packed raw blocks in the same format they're sent across the network. Blocks that get orphaned are just left behind (there are few enough that it hardly matters.
OK, so what is the DB used for? Will everything still work without the DB? If the dataset isnt changing, all the lookup tables can be hardcoded
|
|
|
canonical encoding means a numbering system for each block, tx, vin, vout so that the same number references the same one. Since the blocks are ordered and the tx are ordered within each block and vins and vouts are ordered within each tx, this is a matter of just iterating through the blockchain in a deterministic way. Is it just a count of outputs, or <block-height | transaction index | output index>? I have no idea how this would cause any privacy loss as it is just using 32 bit integers as pointers to the hashes. The privacy issue was raised as somehow a reason to not use efficient encoding.
Ahh ok, I guess it is confusion due to the thread split. I agree, I see no loss in privacy by referring to transactions using historical positions. With a hard fork, transactions could have both options. If you want to spend a recent transactions, you could refer to the output by hash. For transactions that are buried deeply, you could use the index. With reorgs, the indexes could be invalidated, but that is very low risk for 100+ confirms. I used to use a 32bit index for the entire chain, but that doesnt work for parallel sync, plus in a few years it would actually overflow. now it is a 32 bit index for txidind, unspentind and spendind, for the txids, vouts and vins within each bundle of 2000 blocks and an index for the bundle, which is less than 16 bits so (bundle, txidind) and (bundle, unspentind) and (bundle, spendind) would be the corresponding txid, vout and vin within each bundle but yes, use hashes for non-permanent data, index for permanent
|
|
|
Thoughts?
In my opinion LevelDB is better than the previous Berkley DB Bitcoin used. Look at the Berkey DB wallets! They became corrupted because some mac os guy renamed his wallet.dat while Bitcoin was running. Yes we know that Mac filesystems are pile of crap. Level db looks like simple key value storage so I don't see what exactly did the googol engineers screw up. why use a DB for an invariant dataset? After N blocks, the blockchain doesnt change, right? So there is no need for ability to do general DB operations. There are places where DB is the right thing, but it is like comparing CPU mining to ASIC mining. The CPU can do any sort of calcs, like the DB can do any sort of data operations ASIC does one thing, but really fast. A hardcoded dataset is the same. Think of it like a ASIC analogue to DB. so nothing wrong with DB at all, you just cant compare ASIC to CPU
|
|
|
I think we have to admit that a large part of the BTC blockchain has been deanonymized. And until new methods like CT are adopted, this will only get worse.
So rather than fight a losing battle, why not accept that there are people that dont care much about privacy and convenience is more important. In fact they might be required to be public. The canonical encoding allows to encode the existing blockchain and future public blockchain at much better than any other method as it ends up as high entropy compressed 32bit numbers, vs 32byte txid + vout. The savings is much more than 12 bytes, it takes only 6 bytes to encode an unspent, so closer to 30 bytes.
What exactly do you mean by "canonical encoding"? What is the privacy loss here? canonical encoding means a numbering system for each block, tx, vin, vout so that the same number references the same one. Since the blocks are ordered and the tx are ordered within each block and vins and vouts are ordered within each tx, this is a matter of just iterating through the blockchain in a deterministic way. I have no idea how this would cause any privacy loss as it is just using 32 bit integers as pointers to the hashes. The privacy issue was raised as somehow a reason to not use efficient encoding. James yes, I understand that if the blockchain reorgs, the 32bit indexes will need to be recalculated for the ones affected and orphans have no index
|
|
|
Who said there was a scaling problem with bitcoin? me. and I am ready to repeat it again please be specific What part of bitcoin makes it not able to scale? I have solved the "takes too long to sync" issue. I think with interleaving, a much greater tx capacity is possible. Using BTC as the universal clock for all the other cryptos will let smaller chains deal with lower value transactions. And now a way to use GPU for blockchain operations. James
|
|
|
thanks. i like it.
if you want really fast speed, you can make a CUDA/OpenSSL version and use a dedicated core per bundle, among other optimizations. With most all the data being read only, the biggest challenge of GPU coding (out of sync data between cores) is already solved. It might take a few GPU cards to store all the dataset, but then all RPC to scan the entire blockchain happens in milliseconds using the CPU for the latest bundle can create a system that keeps up with new blocks. Who said there was a scaling problem with bitcoin? James
|
|
|
I create separate files that just contain the signatures, so pruning them is a matter of just deleting the files and yes, the index data is compressible, about 2x
Why not to delete all block files? The algorithm would be: 1) check incoming transactions (wild and confirmed) for ther validity except checking that the inputs are unspent or even exist 2) relay valid transactions to your peers 3) keep only last 100 (200-500-2016) blocks in the blockchain on hard drive This would save 95% of hard disk space sure for a pruned node that is fine, but I want a full node with smallest footprint but given time I plan to support all the various different modes
|
|
|
Segwit prunes old signatures, and since the signatures are major source of entropy it may make the leftover better compressible What? I create separate files that just contain the signatures, so pruning them is a matter of just deleting the files and yes, the index data is compressible, about 2x What compression are you using? LRZIP is *very* good for large files (the larger the file, the more redundancies it can find). (A typical 130mb block.dat will be down to 97-98mb with gzip/bzip2, but can go under 90 with lrzip). I am just using mksquashfs so I get a compressed readonly filesytem this protects all the data from tampering so there is little need to reverify anything. the default is compressing the index data in about half, and the sigs data is in separate directory now, so after initial validation it can just be deleted, unless you want to run a full relaying node. I havent gotten a complete run yet with the latest iteration of data, so dont have exact sizes. I expect that the non-signature dataset will come in at less than 20GB for the full blockchain. The reason it gets such good compression is that most of the indices are small numbers that happen a lot, over and over. By mapping the high entropy pubkeys, txids, etc. to a 32 bit index, not only is there a 8x compression, the resulting index is compressible, so probably about a 12x compression. the vin's are the bulkiest, but I encode that into a metascript as described in https://bitco.in/forum/threads/vinmetascript-encoding-03494921.942/James
|
|
|
Segwit prunes old signatures, and since the signatures are major source of entropy it may make the leftover better compressible What? I create separate files that just contain the signatures, so pruning them is a matter of just deleting the files and yes, the index data is compressible, about 2x
|
|
|
|