Bitcoin Forum
May 11, 2024, 07:21:12 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 [2]  All
  Print  
Author Topic: What is stopping pruning from being added to the qt client?  (Read 2733 times)
Peter Todd
Legendary
*
expert
Offline Offline

Activity: 1120
Merit: 1150


View Profile
July 17, 2013, 11:58:09 AM
 #21

What about creating a new node type "block-server".  These nodes maintain historical data only.

That is an existing proposal known as "archive node."

I've got another one, "partial node": http://sourceforge.net/mailarchive/message.php?msg_id=31178377

1715412072
Hero Member
*
Offline Offline

Posts: 1715412072

View Profile Personal Message (Offline)

Ignore
1715412072
Reply with quote  #2

1715412072
Report to moderator
1715412072
Hero Member
*
Offline Offline

Posts: 1715412072

View Profile Personal Message (Offline)

Ignore
1715412072
Reply with quote  #2

1715412072
Report to moderator
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1715412072
Hero Member
*
Offline Offline

Posts: 1715412072

View Profile Personal Message (Offline)

Ignore
1715412072
Reply with quote  #2

1715412072
Report to moderator
1715412072
Hero Member
*
Offline Offline

Posts: 1715412072

View Profile Personal Message (Offline)

Ignore
1715412072
Reply with quote  #2

1715412072
Report to moderator
piotr_n
Legendary
*
Offline Offline

Activity: 2053
Merit: 1354


aka tonikt


View Profile WWW
July 17, 2013, 01:23:58 PM
Last edit: July 18, 2013, 05:01:04 PM by piotr_n
 #22

We've got a rabid one here.

Sorry about that - piotr_n is a bot I wrote to [...]
well - then congratulations, man!
as the first person in the world, you have managed to write a bot that would be smarter than his creator.
that's indeed an achievement that one could be proud of - a breakthrough on the field of AI science, I would say.

but to not get this post deleted because of an OT (which it should), let me add something, on topic.
I've tried to analyze different approaches to calculating a reliable hash of any UTXO database, and it seem to be pretty straight forward.
obviously one would want to be able to calculate the checksum, while having only the UTXO db (without a need to re-parse the entire chain).
obviously one would also want to quickly apply differential changes to the checksum, that come after applying each block.
both can be achieved simple by serializing each UTXO record, then hashing it - and XOR-ing all the hashes together to get the ID of a UTXO snapshot.
extending the hash size to i.e. 512 bits might be a reasonable option as well, though it would be worth to have some numbers to support it.
when an output is being spent, its hash is XOR-ed with the checksum again - this way it gets "removed" from the UTXO db ID.
maybe you don't even need to serialize and hash the record - maybe it would be just equally safe to use (TxID ^ Vout) as the value - EDIT: sorry, what was I thinking Smiley

as always, feel welcome to slap me with a constructive criticism. just please make sure to double check first what "constructive" means.

Check out gocoin - my original project of full bitcoin node & cold wallet written in Go.
PGP fingerprint: AB9E A551 E262 A87A 13BB  9059 1BE7 B545 CDF3 FD0E
gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 4172
Merit: 8419



View Profile WWW
July 17, 2013, 05:12:58 PM
 #23

Isn't that what happens with the main client at the moment?  Script is not verified until the client passes the last checkpoint?
No, not quite. The expensive scriptsig validations are skipped, but inflation prevention/fees/etc are all preserved. All of it is run if checkpoint=0 is set from the commandline.  My preference, after switching to header first sync is to randomly validate signatures in old blocks in proportion to the computation burrying them... and to launch rockets if there is a failure.

Quote
Quote
(1)  Prevents DOS attacks against node synchronization.  Under the original synchronization behavior an attacker can send a node hundreds of gigabytes of difficulty 1 blocks (which thanks to FPGAs and asics can be very cheaply produced).  This difficulty 1 chain might— once synced far enough in _theory_ eventually have more sum diff than the real chain, so a checkpointless node can be forced to evaluate it.  Checkpoints kill this attack dead, though they aren't the only way to kill it, they're a simple and reliable one.
That is a 51% attack.  POW builds directly with hashing power.  Building a 1 difficulty chain with more POW than the main chain is just as hard as building a shorter chain that has more POW.
I suppose I was unclear, I'm pointing out that we must evaluate a competing chain with lots of diff 1 blocks in order to determine if it gets ahead. With the current way sync works doing this opens up a DOS attack because the data is fetched even when the chain never comes close to getting ahead.

Quote
This could also be achieved by setting a lower bound on the total POW rather than adding a checkpoint.
As my message points out…

Quote
This allows much faster detecting of dishonest nodes.
I don't see how it does. It would make your sync fail to produce connectable results in the natural and nonmalicious case of having peers on different branches and would require a bunch of code to switch back to scalar fetching from a single peer once an unconnectable result was detected.  Might be useful, but I'm not sure it would be worth the client's complexity.

Quote
If/when there are more clients, then they should either make sure that they only ever add checkpoints after they have been agreed (either by adding to the reference client or some other consensus system).
Certainly there should be none placed where there is a risk of an honest competing fork existing.

Quote
It seems that rigid checkpointing gives a performance boost as you can skip script checking.
Nah, you don't need checkpoints in order to do that. Though doing randomized checking completely requires proof of treachery.

Quote
However, it allows the node to estimate the difficulty of the chain and also detect hostile nodes that lie about their chain.
Have you seen Amiller's high hash highway stuff?
justusranvier
Legendary
*
Offline Offline

Activity: 1400
Merit: 1009



View Profile
July 17, 2013, 05:21:39 PM
 #24

All nodes could become archives nodes which only store a fraction of the historical data.

1) Interpret the transaction hash key space as a being circular.
2) Pick a random hash value, call this the target hash.
3) Design a probability function with the shape of a bell curve which produces a value of 1 at the target hash and on average would match 1% (or any other desired fraction) of the key space.
4) Apply the probability function to the hash of any prunable transaction the node is aware of. Permanently store transactions for which the function is true and discard the rest (alternately, save all of them but use the probability function to selectively discard transactions when disk space is low).
5) As long as the total number of nodes is a sufficiently high multiple of 1/(desired fraction), and the nodes have sufficient disk space on average to hold their desired fraction of the key space, then the all of the historical transaction will probably remain retrievable.
piotr_n
Legendary
*
Offline Offline

Activity: 2053
Merit: 1354


aka tonikt


View Profile WWW
July 17, 2013, 05:25:15 PM
 #25

@justusranvier, I think its a good idea, but rather for bitcoin 3.0
IMHO, there is no way you could introduce such a system in a foreseeable future.
It just seems to complex. And people wont like it.

Check out gocoin - my original project of full bitcoin node & cold wallet written in Go.
PGP fingerprint: AB9E A551 E262 A87A 13BB  9059 1BE7 B545 CDF3 FD0E
gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 4172
Merit: 8419



View Profile WWW
July 17, 2013, 05:46:51 PM
 #26

All nodes could become archives nodes which only store a fraction of the historical data.

1) Interpret the transaction hash key space as a being circular.
2) Pick a random hash value, call this the target hash.
[...]
That would have very poor locality, resulting in a lot of communications overhead. Because it would be probabilistic it would also multihop probing to find nodes that actually store a particular block.  Please think about the actual access patterns which are needed to perform useful processing on the data. For historical data, stuff beyond a few weeks old, "Pick a random starting point at least $storage from the end, keep $storage worth of block data starting there" is ~optimal considering the access patterns and a lot simpler, and still maintains the relation that so long as the aggregate storage is great enough the probability of being unable to find some of the data is negligible.

It's not necessary to cargo-cult freenet's behavior. Our task is different and fundamentally easier in some regards.
justusranvier
Legendary
*
Offline Offline

Activity: 1400
Merit: 1009



View Profile
July 17, 2013, 06:18:46 PM
 #27

That would have very poor locality, resulting in a lot of communications overhead. Because it would be probabilistic it would also multihop probing to find nodes that actually store a particular block.
I was suggesting this as a way of storing prunable transactions, not blocks. Why is a multihop probe unreasonable when it comes to retrieving archival data? Also, it's possible to improve the locality with path folding.

It's not necessary to cargo-cult freenet's behavior. Our task is different and fundamentally easier in some regards.
Of course it's easier, because you don't need to include strong anonymity into the feature set. Still, their data storage which is deployed and working in practise with a ~30 TB capacity solves a lot of the same problems.
gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 4172
Merit: 8419



View Profile WWW
July 17, 2013, 06:59:41 PM
 #28

I was suggesting this as a way of storing prunable transactions, not blocks.
There is no need in the bitcoin system for uncorrelated random access to transactions by transaction ID. It's a query thats not needed for anything in the system.

Quote
Why is a multihop probe unreasonable when it comes to retrieving archival data?
Because it's lowers reliability and makes DOS attack resistance hard. It's part of why the sybil problem is not solved in DHT's on anonymous networks. (By anonymous there I mean that anyone can come or go or connect at any point, freenet opennet is actually insecure— at least in theory).  Moreover, it's unreasonable because it doesn't actually improve anything over a simpler solution.
justusranvier
Legendary
*
Offline Offline

Activity: 1400
Merit: 1009



View Profile
July 17, 2013, 08:33:17 PM
Last edit: July 17, 2013, 08:52:12 PM by gmaxwell
 #29

There is no need in the bitcoin system for uncorrelated random access to transactions by transaction ID. It's a query thats not needed for anything in the system.
When blocks become very large it will be more efficient to download them in parallel from multiple peers. Allowing that means you've got to subdivide the blocks somehow, might as well subdivide at the transaction level.

In addition I'm assuming that all nodes are going to maintain a complete copy of the UTXO set at all times. That means if they wanted to download old blocks the only data they should need to fetch from the network is the pruned transactions from those blocks.
gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 4172
Merit: 8419



View Profile WWW
July 17, 2013, 08:56:45 PM
 #30

When blocks become very large it will be more efficient to download them in parallel from multiple peers. Allowing that means you've got to subdivide the blocks somehow, might as well subdivide at the transaction level.
Without debating the efficiency of parallel fetch (which is, in fact, debatable)— it's unrelated to talking about archival information— if you want hunks of the latest blocks from your peers you can simply ask them for them. If you want to think of it as a hashtable it's a D1HT, but it seems unnecessary to think of it that way. Besides, prefetching the transaction list requires a synchronous delay.  Simply asking your peers for a blind orthogonal 1/Nth of block $LATEST is more efficient.

As a toy example, if you have 8 peers and hear about a block, you could ask each peer to give you transaction n where (n+x)%8==0 where x is their peer index. If not all 8 peers have received the block you can go back and re-request their fraction from the ones that do.  Additionally, peers can filter their responses based on transactions they've already send you or seen you advertise. By grabbing blind hunks like this you avoid transmitting a transaction list (which is a substantial fraction of the total size of the transactions) and avoid round trips both to get the list and negotiate who has/doesn't have what.  Though this is something of a toy example, it's fairly close the functionality our bloom filtered getblock already provides.

Quote
In addition I'm assuming that all nodes are going to maintain a complete copy of the UTXO set at all times. That means if they wanted to download old blocks the only data they should need to fetch from the network is the pruned transactions from those blocks.
If someone has a complete and trusted UTXO set then what need would you expect them to have to fetch archival data?
TierNolan
Legendary
*
Offline Offline

Activity: 1232
Merit: 1083


View Profile
July 17, 2013, 08:59:46 PM
Last edit: July 17, 2013, 09:43:04 PM by TierNolan
 #31

Have you seen Amiller's high hash highway stuff?

No I hadn't, that's interesting.  

From what I can see, his proposal is (effectively) to store the lowest hash ever produced by the network.

In much the same way having 10 zeros proves you have done 1024 hashes, the lowest ever hash for the network shows that you have done that work.

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF
justusranvier
Legendary
*
Offline Offline

Activity: 1400
Merit: 1009



View Profile
July 17, 2013, 09:00:34 PM
 #32

If someone has a complete and trusted UTXO set then what need would you expect them to have to fetch archival data?
It could be a new node which has just entered the network by downloading the UTXO set, and wants to fully validate the chain back to some point in the past to increase its assurance that it really is operating on the correct branch.
TierNolan
Legendary
*
Offline Offline

Activity: 1232
Merit: 1083


View Profile
July 18, 2013, 12:20:39 AM
 #33

How much flexibility have ASIC chips for header changes?  Does it have to be 80 bytes with the nonce in the exact right location or did they make them more general? 

The current scheme is

80 bytes is expanded to 128 bytes (2 64 byte blocks)

The header is broken up (indexes are in bytes)

Block 1: <version[3:0]> <prev block[31:0]> <merkle[31:4]>   = 64 bytes
Block 2: <merkle[3:0]> <timestamp[3:0]> <bits[3:0]> <nonce[3:0]> <zeros[39:0]> <length[7:0]> =

The first block doesn't contain the nonce, so doesn't change as the counter runs.

This means that they need to perform the following.

First SHA256 pass

sha256(block one) (can be pre-calculated)
sha256(block two)

result: 32 bytes

Second SHA256 pass
sha256(32 bytes rounded up to 1 block)

They are unlikely to have the first block as flexible, since that would be unnecessary hardware for the SHA round.

For the 2nd block, there are a few ways to do it.

Allowing the counter to be moved means they would need to add multiplexors to slide the nonce.  This creates an incentive to not include that flexibility.  However, it might be worth it.

Changing the non-nonce inputs into the 2nd block would just require flip flops, so it is a low cost.  There might be some logic optimisations that can't be performed if those bits aren't hardwired to zero.  This would seem to be well worth it, since they are maintaining forward compatibility.

The second stage isn't affected by the exact nature of the header, so could support a header change.

There are 40 bytes remaining in the header, before the first hash would require 3 rounds.  That puts a maximum limit of 120 bytes for the block header and a rule preventing the nonce from being moved.  However, adding a nonce2 would probably be ok.  It would only be a problem if the communication speed from controller to ASIC was so slow that 4 billion hashes could be completed before the next nonce2 was sent.

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF
TierNolan
Legendary
*
Offline Offline

Activity: 1232
Merit: 1083


View Profile
July 18, 2013, 04:26:10 PM
Last edit: July 18, 2013, 06:35:47 PM by TierNolan
 #34

(Re: Amiller's high hash highway)

No I hadn't, that's interesting.  

Ok, reading more.  The rule is that you have 2 links, one to the previous block and one to the best (lowest) hash before previous that beats previous.  He uses number of leading zeros rather than has value for some reason.  He has a big discussion that there is probably only 1 "best" hash in that context.  Using the full value means that you are pretty much guaranteed that.

If 100 billion hashes are performed, then each hash has an equal chance of being the best.  The fact that there is a difficulty threshold means some never become block headers, but the best ones always will, so it doesn't affect the result.

This means that the best hash so far (ignoring the previous block for some reason) will on average be 50% of the way back, in terms of POW.  It splits the chain into 2 equally important halves.  Each step backwards would decrease total POW by on average 50%, so it takes log(total POW / POW(genesis)) steps to get back the genesis block.

Blocks that don't have the 2nd link would have to be ignored (so assumed to have a worthless (infinitely high) hash).  So, blocks after 270k wouldn't be allowed to link to blocks before 270k and 270k would link to the genesis block.  Effectively, the blocks from 1 to 269999 would be considered to have an infinite hash.

[edit]

Reading even more.  The reason for taking zeros is so that collisions can occur.

If the highest hash has 50 zeros, then there will be 2 hashes with 49 and 4 hashes with 48 and so on.

The "up" link goes to the most recent block with more zeros than this block (or the genesis block, if none).

If the current block had 47 zeros, then you could find all 48 zero blocks following the chain

Code:
while (block != genesis)
  if (block.zeros == 48) {
    addToList(block)
    block = block.prev;
  } else if (block.zeros < 48) {
    block = block.up;
  } else {
    block = block.prev;
}

If the current block has less than 48, you go upwards.  This will skip blocks with zeros less than or equal to the current block, so will never skip a 48 block.

If the current block has 48, then add it to the list and go to the previous block.

If the current block has more than 48, you need to go to the previous block too, since using the up link could cause 48's to be skipped.

This can be used to find a chain of each type of difficulty.  This gives more points, so less variance to work out total difficulty.

However, even just showing the 50 zero header would be strong evidence of the total hashing power.  The up link would point directly to the genesis block.  It would also show the minimum work against that genesis block.

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF
mustyoshi (OP)
Sr. Member
****
Offline Offline

Activity: 287
Merit: 250



View Profile
July 21, 2013, 05:38:43 AM
 #35

I am starting to think the only way to prune effectively is to create an archive node, and such a rule for lightnodes that they always keep a connection or route to an archive node.

Transactions' Inputs would only be pruned after the coinbase in their block was spendable, that would be the rule so that all lightnodes have the same pruned chain.

When a new node connected they would either find an archive node and start the whole process, or query lightnodes to determine a consensus of the legitimacy of a pruned block. Using the archival node would be preferred due to it being verifiable. To ensure archival nodes exist in sufficient quantities, perhaps there could be a system to buy access to them? Thus making them an avenue for profit.
Pages: « 1 [2]  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!