Bitcoin Forum
June 17, 2024, 06:38:29 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
  Home Help Search Login Register More  
  Show Posts
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 [47] 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 ... 315 »
921  Bitcoin / Development & Technical Discussion / Re: Segwit details? on: March 15, 2016, 12:08:21 PM
Read the BIPs: https://github.com/bitcoin/bips.They are appropriately named. Their numbers are 14x.
wow, that's a LOT of changes...

practically speaking, will segwit tx work for sending to an old wallet or do both sides need to run it for it to be spendable. it seems that would be the case. if so, doesnt that create a lot of problems along the lines of "i sent you this txid, but you need this wtxid to be able to spend it, oh and the new updated wallet that supports segwit that isnt available yet from your vendor"

922  Bitcoin / Development & Technical Discussion / Segwit details? N + 2*numtxids + numvins > N, segwit uses more space than 2MB HF on: March 15, 2016, 11:20:53 AM
I cant find the changes needed to be made to support segwit.

It must change the protocol and blockchain format, so I would imagine there is some obvious place I overlooked where to find it.

James
923  Bitcoin / Development & Technical Discussion / Re: An optimal engine for UTXO db on: March 13, 2016, 05:21:23 AM
If you think I am ignoring overall system speed, then you are not really reading my posts.
I also like kokoto's requirements and my dataset is designed to get good overall system performance for those, and also to be able to do blockexplorer level queries from the same dataset for lower level blockchain queries.

Not sure what your point is? Maybe you can try to run a DB system written in JS? I dont know of any that is anywhere fast enough.

It is not a choice between DB vs no DB, but rather no-DB vs nothing, as my requirement is for a system that works from a chrome app

James
You are making 2 unrelated mistakes:

1) mixing hardware speed with software speed. On fast hardware it is OK to have slow software.

2) your tests are only including sync speed (both initial and re-sync after getting temporary offline). You don't test continuous operation nor chain reorganization.

This has nothing to do with using or not using a DBMS. It is a more fundamental question what your system will be capable of when finished.
 

1) of course on fast hardware it is ok to have slow software. but slow software on slow hardware is not acceptable. And if you have to write fast software for the slow system, there is no point to write slow software is there?

2) initial sync and re-sync are very important. continuous operation will be running out of RAM directly, combining data from the readonly dataset. And it is not so easy to test things before they are running. I am assuming that chain reorgs wont go past 100 blocks very often. In the event it does, you would have a few minutes of downtime to regenerate the most recent block.

If it has nothing to do with DBMS, then why did you keep doing the database class thing?

As far as performance of iguana when it is complete, it will be fast for things that typically are very slow, ie. importprivkey.
924  Bitcoin / Development & Technical Discussion / Re: An optimal engine for UTXO db on: March 13, 2016, 04:59:41 AM
Otherwise we would still be using the slowest CPU that will eventually complete the tasks.
This is a self-refuting argument.

Hardware gets faster therefore people are willing to put up with slower software.

Do not confuse hardware speed and software speed when discussing speed of the whole system.

Edit: A good example of clear thinking from somebody who isn't professionally doing software development:
I think it's very important to be able to browse through all the records in a shortest possible time.

I disagree, the major requirements

verify utxo spend
verify utxo amount
remove used txo
add new utxos from block
reorganize revert utxo

If you think I am ignoring overall system speed, then you are not really reading my posts.
I also like kokoto's requirements and my dataset is designed to get good overall system performance for those, and also to be able to do blockexplorer level queries from the same dataset for lower level blockchain queries.

Not sure what your point is? Maybe you can try to run a DB system written in JS? I dont know of any that is anywhere fast enough.

It is not a choice between DB vs no DB, but rather no-DB vs nothing, as my requirement is for a system that works from a chrome app

James
925  Bitcoin / Development & Technical Discussion / Re: LevelDB reliability? on: March 12, 2016, 03:21:32 AM
LevelDB being stupid is one of the major reasons that people have to reindex on Bitcoin Core crashes. There have been proposals to replace it but so far there are no plans on doing so. However people are working on using different databases in Bitcoin Core and those are being implemented and tested.

Maybe the most reliable DB is no DB at all? Use efficiently encoded read only files that can be directly memory mapped.

https://bitcointalk.org/index.php?topic=1387119.0
https://bitcointalk.org/index.php?topic=1377459.0
https://bitco.in/forum/forums/iguana.23/

James

LevelDb is used to store the UTXO set. How is that read only?

UTXO set falls into the write once category. Once an input is spent, you cant spend it again. The difference with the UTXO set is explained here: https://bitco.in/forum/threads/30mb-utxo-bitmap-uncompressed.941/

So you can calculate the OR'able bitmap for each bundle in parallel (as soon as all its prior bundles are there). Then to create the current utxo set, OR the bitmaps together.

What will practically remain volatile is the bitmap, but the overlay bitmap for each bundle is read only. This makes a UTXO check a matter to find the index of the vout and check a bit.

James

Not sure if we are talking about the same thing. Following your link, it seems you are describing the internal data structure used by a block explorer which aren't necessarily optimal for a bitcoin node.
In particular, you use a 6 byte locator. Given a new incoming transaction that can spend any utxo (hash+vout), do you need to map it to a locator? And if so, how is it done?


iguana is a bitcoin node that happens to update block explorer level dataset.
The data structures are optimized for parallel access, so multicore searches can be used
however even with a single core searching linearly (backwards), it is quite fast to find any txid

Each bundle of 2000 files has a hardcoded hash table for all the txid's in it, so it is a matter of doing a hash lookup until it is found. I dont have timings of fully processing a full block yet, but I dont expect it would take more than a few seconds to update all vins and vouts

since txid's are already high entropy, there is no need to do an additional hash, so I XOR all 4 64-bit long ints of the txid together to create an index into an open hash table, which is created to be never more than half full, so it will find any match in very few iterations. Since everything is memory mapped, after the initial access to swap it in, each search will take less than a microsecond
926  Bitcoin / Development & Technical Discussion / Re: Proof that Proof of Stake is either extremely vulnerable or totally centralised on: March 11, 2016, 06:19:21 PM
This is an very informal proof, because I wanted it to be as readable as possible for the majority of readers. I hope this will finally show why Proof of Stake (PoS) is not a viable consensus design.

Ok, now please provide a formal proof for minority of readers who can't understand an informal one (e.g. me).
@kushti i think the logic used in this thread is that given that we assume A inevitably leads to B, since A is self-evident, then B is too.

It is hard to argue with that sort of logic as it allows to prove conclusively that B is true, it doesnt matter what B is, just as long as A is self-evident.

Like this:

We will assume that above absolute zero temperatures it is inevitable that the moon is made of cheese.

Since we are not all frozen at absolute zero, it is clear that the moon is made of cheese.

I think formally it would be: Assume A -> B and A is true, therefore B is true

James

Well then the burden is to prove A. Why is it assumed "self evident"?
Because it is in the OP, so it has to be true
927  Bitcoin / Development & Technical Discussion / Re: Proof that Proof of Stake is either extremely vulnerable or totally centralised on: March 11, 2016, 02:26:42 PM
This is an very informal proof, because I wanted it to be as readable as possible for the majority of readers. I hope this will finally show why Proof of Stake (PoS) is not a viable consensus design.

Ok, now please provide a formal proof for minority of readers who can't understand an informal one (e.g. me).
@kushti i think the logic used in this thread is that given that we assume A inevitably leads to B, since A is self-evident, then B is too.

It is hard to argue with that sort of logic as it allows to prove conclusively that B is true, it doesnt matter what B is, just as long as A is self-evident.

Like this:

We will assume that above absolute zero temperatures it is inevitable that the moon is made of cheese.

Since we are not all frozen at absolute zero, it is clear that the moon is made of cheese.

I think formally it would be: Assume A -> B and A is true, therefore B is true

James
928  Bitcoin / Development & Technical Discussion / Re: LevelDB reliability? on: March 11, 2016, 01:37:26 PM
LevelDB being stupid is one of the major reasons that people have to reindex on Bitcoin Core crashes. There have been proposals to replace it but so far there are no plans on doing so. However people are working on using different databases in Bitcoin Core and those are being implemented and tested.

Maybe the most reliable DB is no DB at all? Use efficiently encoded read only files that can be directly memory mapped.

https://bitcointalk.org/index.php?topic=1387119.0
https://bitcointalk.org/index.php?topic=1377459.0
https://bitco.in/forum/forums/iguana.23/

James

LevelDb is used to store the UTXO set. How is that read only?

UTXO set falls into the write once category. Once an input is spent, you cant spend it again. The difference with the UTXO set is explained here: https://bitco.in/forum/threads/30mb-utxo-bitmap-uncompressed.941/

So you can calculate the OR'able bitmap for each bundle in parallel (as soon as all its prior bundles are there). Then to create the current utxo set, OR the bitmaps together.

What will practically remain volatile is the bitmap, but the overlay bitmap for each bundle is read only. This makes a UTXO check a matter to find the index of the vout and check a bit.

James
929  Bitcoin / Development & Technical Discussion / Re: Using compact indexes instead of hashes as identifiers. on: March 11, 2016, 01:32:19 PM
but yes, use hashes for non-permanent data, index for permanent

I thought of a possible issue with offline signing.  If you want to offline sign a transaction, then you need to include proof that the index refers to a particular transaction output.

Armory already has to include all the inputs into the transaction in order to figure out how much you are spending, so it isn't that big a deal.  The problem is that there is no way to prove that a particular index refers to a particular output.  You would need to include the header chain and that is only SPV safe. 

For offline transactions, using the hash is probably safer, but that doesn't affect internal storage, just if transactions could refer to previous outputs by index.
Maybe I didnt make it clear in this thread. The indexes are used for traversing lists, etc., but all the txids are in the dataset. So the dataset has indexes referring to other indexes (or implicitly by their position) and each index can be converted to its fully expanded form. Tastes great and less calories.

So you can do all the verifications, etc. as it is a local DB/rawfile replacement combined into one. Being read-only for the most part mksquashfs can create a halfsized dataset that also acts to protect it from data corruption. I split out the sigs into separate files, so they can be purged after being validated. I am also thinking of doing the same with pubkeys, so nodes that want to be able to search all pubkeys seen could, but nodes that dont wont need to keep it around.

It works as a lossless codec that stores its data in a way that is ready to do searches without any setup time (for the data it already processed). All addresses are fully indexed, even multisig/p2sh so there is no need for importprivkey. Listunspent serially took ~2 milliseconds on a slow laptop. With the parallel datasets, it is well suited for multi-core searching to allow for linear speedups based on number of cores. even GPU could be used for industrial strength speed as long as there is a pipeline of requests to avoid the latency issues of using GPU per RPC

James
930  Bitcoin / Development & Technical Discussion / Re: LevelDB reliability? on: March 11, 2016, 01:33:30 AM
why use a DB for an invariant dataset?
That sounds like an exam question for DBMS 101 course.

1) independence of logical data from physical storage
2) sharing of the dataset between tasks and machines
3) rapid integrity verification
4) optimization of storage method to match the access patterns
5) maintenance of transactional integrity with related datasets and processes
6) fractional/incremental backup/restore while accessing software is online
7) support for ad-hoc queries without the need to write software
Cool ease of integration with new or unrelated software packages
9) compliance with accounting and auditing standards
10) easier gathering of statistics about access patterns

I think those 10 answers would be good enough for A or A-, maybe B+ in a really demanding course/school.

OK, you can get an A

However, memory mapped files share a lot of the same advantages you list:

1) independence of logical data from physical storage - yes
2) sharing of the dataset between tasks and machines - yes (you do need both endian forms)
3) rapid integrity verification - yes, even faster as once verified, no need to verify again
4) optimization of storage method to match the access patterns - yes that is exactly what has been done
5) maintenance of transactional integrity with related datasets and processes - yes
6) fractional/incremental backup/restore while accessing software is online  - yes

7) support for ad-hoc queries without the need to write software - no
Cool ease of integration with new or unrelated software packages - no
9) compliance with accounting and auditing standards - not sure
10) easier gathering of statistics about access patterns - not without custom code

So it depends on if 7 to 10 trump the benefits and if the resources are available to get it working

James
931  Bitcoin / Development & Technical Discussion / Re: LevelDB reliability? on: March 11, 2016, 01:28:26 AM
Bitcoin Core does not store the blockchain in a database (or leveldb) and never has. The blockchain is stored in pre-allocated append only files on the disk as packed raw blocks in the same format they're sent across the network.  Blocks that get orphaned are just left behind (there are few enough that it hardly matters.

OK, so what is the DB used for? Will everything still work without the DB?

If the dataset isnt changing, all the lookup tables can be hardcoded
932  Bitcoin / Development & Technical Discussion / Re: Using compact indexes instead of hashes as identifiers. on: March 11, 2016, 01:25:11 AM
canonical encoding means a numbering system for each block, tx, vin, vout so that the same number references the same one. Since the blocks are ordered and the tx are ordered within each block and vins and vouts are ordered within each tx, this is a matter of just iterating through the blockchain in a deterministic way.

Is it just a count of outputs, or <block-height | transaction index | output index>?

Quote
I have no idea how this would cause any privacy loss as it is just using 32 bit integers as pointers to the hashes. The privacy issue was raised as somehow a reason to not use efficient encoding.

Ahh ok, I guess it is confusion due to the thread split.  I agree, I see no loss in privacy by referring to transactions using historical positions.

With a hard fork, transactions could have both options.  If you want to spend a recent transactions, you could refer to the output by hash.  For transactions that are buried deeply, you could use the index.  With reorgs, the indexes could be invalidated, but that is very low risk for 100+ confirms.
I used to use a 32bit index for the entire chain, but that doesnt work for parallel sync, plus in a few years it would actually overflow.

now it is a 32 bit index for txidind, unspentind and spendind, for the txids, vouts and vins within each bundle of 2000 blocks

and an index for the bundle, which is less than 16 bits

so (bundle, txidind) and (bundle, unspentind) and (bundle, spendind) would be the corresponding txid, vout and vin within each bundle

but yes, use hashes for non-permanent data, index for permanent
933  Bitcoin / Development & Technical Discussion / Re: LevelDB reliability? on: March 10, 2016, 11:06:13 PM
Thoughts?

In my opinion LevelDB is better than the previous Berkley DB Bitcoin used.

Look at the Berkey DB wallets! They became corrupted because some mac os guy renamed his wallet.dat while Bitcoin was running.

Yes we know that Mac filesystems are pile of crap.


Level db looks like simple key value storage so I don't see what exactly did the googol engineers screw up.
why use a DB for an invariant dataset?
After N blocks, the blockchain doesnt change, right?

So there is no need for ability to do general DB operations. There are places where DB is the right thing, but it is like comparing CPU mining to ASIC mining.

The CPU can do any sort of calcs, like the DB can do any sort of data operations
ASIC does one thing, but really fast. A hardcoded dataset is the same. Think of it like a ASIC analogue to DB.

so nothing wrong with DB at all, you just cant compare ASIC to CPU
934  Bitcoin / Development & Technical Discussion / Re: LevelDB reliability? on: March 10, 2016, 09:10:42 PM
LevelDB being stupid is one of the major reasons that people have to reindex on Bitcoin Core crashes. There have been proposals to replace it but so far there are no plans on doing so. However people are working on using different databases in Bitcoin Core and those are being implemented and tested.

Maybe the most reliable DB is no DB at all? Use efficiently encoded read only files that can be directly memory mapped.

https://bitcointalk.org/index.php?topic=1387119.0
https://bitcointalk.org/index.php?topic=1377459.0
https://bitco.in/forum/forums/iguana.23/

James
935  Bitcoin / Development & Technical Discussion / Re: Using compact indexes instead of hashes as identifiers. on: March 10, 2016, 09:06:58 PM
I think we have to admit that a large part of the BTC blockchain has been deanonymized. And until new methods like CT are adopted, this will only get worse.

So rather than fight a losing battle, why not accept that there are people that dont care much about privacy and convenience is more important. In fact they might be required to be public. The canonical encoding allows to encode the existing blockchain and future public blockchain at much better than any other method as it ends up as high entropy compressed 32bit numbers, vs 32byte txid + vout. The savings is much more than 12 bytes, it takes only 6 bytes to encode an unspent, so closer to 30 bytes.

What exactly do you mean by "canonical encoding"?  What is the privacy loss here?
canonical encoding means a numbering system for each block, tx, vin, vout so that the same number references the same one. Since the blocks are ordered and the tx are ordered within each block and vins and vouts are ordered within each tx, this is a matter of just iterating through the blockchain in a deterministic way.

I have no idea how this would cause any privacy loss as it is just using 32 bit integers as pointers to the hashes. The privacy issue was raised as somehow a reason to not use efficient encoding.

James

yes, I understand that if the blockchain reorgs, the 32bit indexes will need to be recalculated for the ones affected and orphans have no index
936  Bitcoin / Development & Technical Discussion / Re: An optimal engine for UTXO db on: March 10, 2016, 05:36:02 PM
Who said there was a scaling problem with bitcoin?
me. and I am ready to repeat it again
please be specific

What part of bitcoin makes it not able to scale?

I have solved the "takes too long to sync" issue. I think with interleaving, a much greater tx capacity is possible. Using BTC as the universal clock for all the other cryptos will let smaller chains deal with lower value transactions. And now a way to use GPU for blockchain operations.

James
937  Bitcoin / Development & Technical Discussion / Re: An optimal engine for UTXO db on: March 10, 2016, 11:55:34 AM
thanks. i like it.
if you want really fast speed, you can make a CUDA/OpenSSL version and use a dedicated core per bundle, among other optimizations.

With most all the data being read only, the biggest challenge of GPU coding (out of sync data between cores) is already solved.

It might take a few GPU cards to store all the dataset, but then all RPC to scan the entire blockchain happens in milliseconds

using the CPU for the latest bundle can create a system that keeps up with new blocks.

Who said there was a scaling problem with bitcoin?

James
938  Bitcoin / Development & Technical Discussion / Re: Using compact indexes instead of hashes as identifiers. on: March 10, 2016, 12:05:45 AM
I create separate files that just contain the signatures, so pruning them is a matter of just deleting the files
and yes, the index data is compressible, about 2x
Why not to delete all block files?  Grin
The algorithm would be:
1) check incoming transactions (wild and confirmed) for ther validity except checking that the inputs are unspent or even exist
2) relay valid transactions to your peers
3) keep only last 100 (200-500-2016) blocks in the blockchain on hard drive

This would save 95% of hard disk space

sure for a pruned node that is fine, but I want a full node with smallest footprint
but given time I plan to support all the various different modes
939  Bitcoin / Development & Technical Discussion / Re: Using compact indexes instead of hashes as identifiers. on: March 10, 2016, 12:04:10 AM
Segwit prunes old signatures, and since the signatures are major source of entropy it may make the leftover better  compressible
What?  Grin
I create separate files that just contain the signatures, so pruning them is a matter of just deleting the files

and yes, the index data is compressible, about 2x

What compression are you using?

LRZIP is *very* good for large files (the larger the file, the more redundancies it can find).

(A typical 130mb block.dat will be down to 97-98mb with gzip/bzip2, but can go under 90 with lrzip).

I am just using mksquashfs so I get a compressed readonly filesytem
this protects all the data from tampering so there is little need to reverify anything.
the default is compressing the index data in about half, and the sigs data is in separate directory now, so after initial validation it can just be deleted, unless you want to run a full relaying node.

I havent gotten a complete run yet with the latest iteration of data, so dont have exact sizes. I expect that the non-signature dataset will come in at less than 20GB for the full blockchain. The reason it gets such good compression is that most of the indices are small numbers that happen a lot, over and over. By mapping the high entropy pubkeys, txids, etc. to a 32 bit index, not only is there a 8x compression, the resulting index is compressible, so probably about a 12x compression.

the vin's are the bulkiest, but I encode that into a metascript as described in https://bitco.in/forum/threads/vinmetascript-encoding-03494921.942/

James
940  Bitcoin / Development & Technical Discussion / Re: Using compact indexes instead of hashes as identifiers. on: March 09, 2016, 08:28:24 PM
Segwit prunes old signatures, and since the signatures are major source of entropy it may make the leftover better  compressible
What?  Grin
I create separate files that just contain the signatures, so pruning them is a matter of just deleting the files

and yes, the index data is compressible, about 2x
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 [47] 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 ... 315 »
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!