Bitcoin Forum

Bitcoin => Development & Technical Discussion => Topic started by: piuk on November 25, 2011, 05:42:17 PM



Title: A proposal for a scalable blockchain.
Post by: piuk on November 25, 2011, 05:42:17 PM
The problem:

The blockchain will not scale how it is used currently. There is some mention of pruning unspent outputs mention on https://en.bitcoin.it/wiki/Scalability (https://en.bitcoin.it/wiki/Scalability) however this method still requires storage of all blockheaders, meaning there is still a unlimited cap on the blockchain size. Merkel tree pruning will not help to any significant extent as you maybe be able to tell if a transaction is in a block, however you cannot validate that block without all transactions. If lightweight clients cannot validate blocks then they cannot mine, relay blocks or relay transactions there is almost not point to them validating anything at all and might as well use a centralised blockchain.

a) A smaller blockchain helps lower the barrier of entry for the new users.
b) With less risk of blockchain bloat transaction fees could be lowered
c) The larger the blockchain the less users who will run the client and the more centralised the network becomes.

Proposed solution.

At certain points in time the client generates a snapshot of the of every unspent tx output in the chain. This snapshot encapsulates the state of the blockchain upto, but not including, that block.  When a miner produces a block he generates a SHA256 hash of this ledger and includes the hash it in the blocks coinbase.

When a client begins the initial block chain download it starts from the chain head and works backwards. The client downloads a minimum of 2016 blocks before it will accept a ledger hash. If there is a fork in the chain the client will continue to download blocks until it finds a pair of blocks that at least one ledger hash can be agreed upon. When an identical ledger is found the chain with the best proof of work wins. When the client accepts a hash it will ask the node to provide it with the full ledger corresponding to it which can be self hashed and verified. If the node doesn't have the ledger for that hash it may ask other nodes, if no nodes have a copy then it should continue to download past blocks until it can find a hash and a full copy of the ledger.

To validate a transaction the client locates the each txIn outpoint in the unspent ledger and checks the corresponding script for validity. The client checks the validity of a transaction by looking at the txOutputs in it's latest ledger and at in the transactions included in blocks after. Therefore nodes do need to not generate a balance sheet every transaction instead they would keep a balance sheet for approximately two weeks (2016 blocks) before regenerating. Two weeks has been chosen as a base value because it provides enough blocks to use for difficulty targeting, however nodes are free to keep more or less blocks depending on their storage capacity.

When the client decides it is time to generate a new ledger it looks through the chain for a more recent block which has a ledger hash in it's coinbase and is at least 2016 blocks behind the chain head. It then generates a new ledge sheet for that block and checks that the hash matches. It if matches then it is free to purge all transactions/blocks before that time. If the hash does not match then it is important to note the client does not reject the chain, as long as the proof of work is valid. The order of transactions is already decided by the order in the blockchain so the client would simply wait until a miner produces a hash it can agree with, it should not purge transactions until a hash is found. Miners may want to keep blocks for a longer period of time to ensure they have the necessary proof of work should it be needed.

Would this fork the chain?

No. Miners are free to insert whatever data they like into their block's coinbase. Clients that wish to hold a entire blockchain history can simply ignore it.

How much data would clients need to hold?

Quote
Approximately 4.5 million txOuts and 3.3 million txIns - so ~1.2 million unspent outputs.

At the present blockchain size, the ledger would consume at most:

(256 + 160 + 16 + 64) * 1.2 million = 71MB

+ Approximatly two weeks worth of blocks = 100 MB total

This is the initial estimate with compression it maybe possible to halve this value.

Could you mine without the entire blockchain?
Yes. The network could operate fully without any node having the entire blockchain. It is possible that a chain fork could go so far into the past that no nodes have a copy of the chain long enough to resolve the split, however this is extremely unlikely without a malicious attacker having 51% hashing power for a significant period of time.

How would this be adopted, would all miners need to switch immediately?
There needs to be at least one miner producing a ledger hash around every two weeks. So initially this would be possible to implement with only a small pool adopting the scheme. The more frequent miners produce a ledger the more efficiently clients will be able to prune old transactions.

File format
The initial proposed file format would simply be a dump of all unspent txOutputs, in the order they appeared in the blockchain, in the same format as they are serialised over the nework. This has the advantage that any bitcoin client that participates on a network level can decode the file with minimum effort.

The file will probably need to be indexed after it is downloaded as it will not be suitable for locating a txOutput efficiently. The file format for the ledger will be included in the coinbase along with the hash. In the file there will be many duplicate scripts and transaction hashes giving the possibility of much greater compression in future.

Coinbase
Magic Value - File format - Ledger size - Hash
uint32_t, uint16_t, uint64_t, uint256_t

** magic value is a flag indicating this coinbase holds a ledger hash


/Discuss. Feel free to point out any glaringly obvious flaws :)



Title: Re: A proposal for a scalable blockchain.
Post by: btc_artist on November 25, 2011, 05:44:53 PM
Don't have any comments at the moment, but this looks like an interesting proposal.

EDIT: Upon second reading, I don't see any glaringly obvious flaws and it looks like it might work.  I am, however, very much a neophyte in terms of understanding the bitcoin protocol.


Title: Re: A proposal for a scalable blockchain.
Post by: notme on November 25, 2011, 05:54:40 PM
Block contents for blocks not containing address you own can be pruned.  That's the whole purpose of the merkle tree structure the blockchain uses.  Future clients will do this and only some nodes will hold the entire chain.


Title: Re: A proposal for a scalable blockchain.
Post by: terrytibbs on November 25, 2011, 06:06:58 PM
watching this


Title: Re: A proposal for a scalable blockchain.
Post by: btc_artist on November 25, 2011, 06:39:48 PM
Block contents for blocks not containing address you own can be pruned.  That's the whole purpose of the merkle tree structure the blockchain uses.  Future clients will do this and only some nodes will hold the entire chain.
This would mean the system would depend on centralized node(s) to always have the full block chain, since no clients can be expected to have any given block at any given time, right?  I don't know if centralization like that would be a good idea.


Title: Re: A proposal for a scalable blockchain.
Post by: notme on November 25, 2011, 06:42:04 PM
Block contents for blocks not containing address you own can be pruned.  That's the whole purpose of the merkle tree structure the blockchain uses.  Future clients will do this and only some nodes will hold the entire chain.
This would mean the system would depend on centralized node(s) to always have the full block chain, since no clients can be expected to have any given block at any given time, right?  I don't know if centralization like that would be a good idea.

Anyone can still mirror the entire chain, most likely including myself.  This is not centralization.


Title: Re: A proposal for a scalable blockchain.
Post by: btc_artist on November 25, 2011, 06:44:37 PM
Block contents for blocks not containing address you own can be pruned.  That's the whole purpose of the merkle tree structure the blockchain uses.  Future clients will do this and only some nodes will hold the entire chain.
This would mean the system would depend on centralized node(s) to always have the full block chain, since no clients can be expected to have any given block at any given time, right?  I don't know if centralization like that would be a good idea.

Anyone can still mirror the entire chain, most likely including myself.  This is not centralization.
But would there be any built-in incentive to mirror the block chain?  If the default client wouldn't, then that might not be a good direction to go.


Title: Re: A proposal for a scalable blockchain.
Post by: bc on November 25, 2011, 06:47:47 PM
Fascinating idea - if I understand it. It sounds like this proposal is sort of like an "oral history" of a block chain - as long as enough recently-connected clients are around to testify as to the "recent" history. Is that an accurate description?

So I'm sure that I understand this:

a) when would it be "safe" to "forget" a genesis block from such a blockchain? As soon as all coins generated by it have been spent, and 51% of clients have learned of these spends (received the blocks they're contained in)?

b) IF 51% of the network were forced offline for 2+ weeks, could a malevolent actor with 51% hash power step in and present a complete two-week false history?


Title: Re: A proposal for a scalable blockchain.
Post by: notme on November 25, 2011, 06:55:54 PM
Block contents for blocks not containing address you own can be pruned.  That's the whole purpose of the merkle tree structure the blockchain uses.  Future clients will do this and only some nodes will hold the entire chain.
This would mean the system would depend on centralized node(s) to always have the full block chain, since no clients can be expected to have any given block at any given time, right?  I don't know if centralization like that would be a good idea.

Anyone can still mirror the entire chain, most likely including myself.  This is not centralization.
But would there be any built-in incentive to mirror the block chain?  If the default client wouldn't, then that might not be a good direction to go.

Whatever... waste your time on a solved problem.  I'm tired of beating my head against this particular wall.  I never said the reference client should have this as default, but there is room for many different clients.  Personally, I'm going to focus on solving the lack of usefulness for bitcoin before worrying about what might happen when I and others work our butts off to get us to the point where this discussion even matters.  And I'm pretty sure the Satoshi solution is the way to go.  Have you read the whitepaper?  I highly recommend it . http://www.bitcoin.org/bitcoin.pdf


Title: Re: A proposal for a scalable blockchain.
Post by: btc_artist on November 25, 2011, 07:00:43 PM
Block contents for blocks not containing address you own can be pruned.  That's the whole purpose of the merkle tree structure the blockchain uses.  Future clients will do this and only some nodes will hold the entire chain.
This would mean the system would depend on centralized node(s) to always have the full block chain, since no clients can be expected to have any given block at any given time, right?  I don't know if centralization like that would be a good idea.

Anyone can still mirror the entire chain, most likely including myself.  This is not centralization.
But would there be any built-in incentive to mirror the block chain?  If the default client wouldn't, then that might not be a good direction to go.

Whatever... waste your time on a solved problem.  I'm tired of beating my head against this particular wall.  I never said the reference client should have this as default, but there is room for many different clients.  Personally, I'm going to focus on solving the lack of usefulness for bitcoin before worrying about what might happen when I and others work our butts off to get us to the point where this discussion even matters.  And I'm pretty sure the Satoshi solution is the way to go.  Have you read the whitepaper?  I highly recommend it . http://www.bitcoin.org/bitcoin.pdf
I'm not convinced it's solved.  But I agree with you that bitcoin needs to actually become useful and be adopted before we have to worry about these things.


Title: Re: A proposal for a scalable blockchain.
Post by: piuk on November 25, 2011, 07:44:54 PM
Fascinating idea - if I understand it. It sounds like this proposal is sort of like an "oral history" of a block chain - as long as enough recently-connected clients are around to testify as to the "recent" history. Is that an accurate description?

Sort of miners would be generating a hash that says "this is the result of all transactions at this block", as long as enough hashing power agrees with this state then it is accepted.

a) when would it be "safe" to "forget" a genesis block from such a blockchain? As soon as all coins generated by it have been spent, and 51% of clients have learned of these spends (received the blocks they're contained in)?

There would be no 100% safe point. I guess the idea would be most clients would hold blocks from the past few weeks (possibly less, few days) but miners would would hold blocks for a much longer period. If your holding the blocks for the past year then an attacker would need to put in a years worth of hashing power to beat it.

b) IF 51% of the network were forced offline for 2+ weeks, could a malevolent actor with 51% hash power step in and present a complete two-week false history?

Yes assuming every node only had the past two weeks blocks.

Whatever... waste your time on a solved problem.  I'm tired of beating my head against this particular wall.  I never said the reference client should have this as default, but there is room for many different clients.  Personally, I'm going to focus on solving the lack of usefulness for bitcoin before worrying about what might happen when I and others work our butts off to get us to the point where this discussion even matters.  And I'm pretty sure the Satoshi solution is the way to go.  Have you read the whitepaper?  I highly recommend it . http://www.bitcoin.org/bitcoin.pdf

Satoshi's paper is pretty vague on this, where is it explained how you prove a transaction was in a block with the merkel tree? With pruning you still have to hold all unspent outputs, including the transaction hash and scriptPubKey. This method clusters unspent txOutputs and no longer requires the transaction hash.

Edit: Even if you can prove a transaction is in a block with the merkel tree you cannot validate a block without all transactions. If you can't validate a block then how do you know you haven't been given false data?

If clients cannot validate a block with only the headers then what is to stop a malicious attacker generating a set of block headers with fake hashes and difficulty targets. There would be no way for nodes to determine which chain is valid without downloading the transactions.


Title: Re: A proposal for a scalable blockchain.
Post by: Skybuck on November 25, 2011, 07:59:17 PM
I completely do not understand your proposal it's enormously complex and relies on additional communication providing data which might or might not be available. But reading your ideas I have my own idea:
(Editi/Addition:Ok, now I understand your proposal better, I didn't know what a ledger is lol ;) I think our ideas are similiar, except my idea was to create a new protocol to exchange the ledger hashes, apperently your idea is to embed those into the block chains, so no new protocol would be necessary, However a drawback of your idea would be that only miners get to verify the ledger/balance sheet, an obvious flaw I think ;) It needs to be seperated so everybody can have a vote on it ! ;) Then again, blocks are the way the network protocol and verification works, this could mean miners are now in control and decide what transactions are valid and which are not, those it seems bitcoin is coming down crashing and burning it's no longer p2p, it's no longer everybody in control, only the miners are now in control, probably a very dangerous situation ;) at least non-miners can still verify, but rejecting will be useless it seems, since they can never win).

How about this instead:

1. Instead of storing all transactions which ever occured, a point in historic time is chosen, where the software makes up the balances of all bitcoin addresses.

2. Bitcoin addresses which have turned into zero balances are thrown away.

3. Instead of storing the transactions, the balances are stored, which could cut back on data.

What kind of problems could this solution face and what could be the additional solutions:

1. Somebody could change it's own balance in it's own data, to give himself a million dollars.

This would then conflict with the datasets of others.

An idea could be to calculate a hash over all bit coin address balances.

This hash is then broadcasted throughout the system.

The number of confirmations is tracked.

If the majority agrees that the hash is indeed correct it is accepted into a "balance chain".

To make it a little bit more difficult to fake this balance chain, the hash could follow the principle of "difficulty", except since there is no rush, the difficulty could be set 100x times the current difficulty.
(Perhaps the difficult setting for the balance should match the number of "transaction/blocks collapses" * "current difficulty". In other words, blocks are calculated every 10 minutes, the balance block is calculated every 1000 blocks. So the difficulty for the balance chain is 1000x the block difficulty, which should keep both in sync).

Seems simply enough idea.

In principle there is probably no difference between "storing transactions" or "storing the sum of all those transactions" ?!?!?

Except that those transactions are "hashed into a block chain".

Well the same can be done with a "balance chain".

Perhaps it should also become a "racing balance chain" where the longest balance chain wins.

The difference is however: the balance chain is much harder to calculate.

Also the balance chain lags behind the transaction chain, by for example 1000 or 2000 blocks.

So the transaction chain gets a chance to stabilize, so the balance chain can work on stabilized transaction/block data.


Title: Re: A proposal for a scalable blockchain.
Post by: Akemashite Omedetou on November 25, 2011, 09:53:01 PM
Quote
3. Instead of storing the transactions, the balances are stored, which could cut back on data.

This would require quite a rework on the entire bitcoin network.
Right now it is based on transactions not just for the sake of it, but for flexibility which is achieved with scripts. Basically a transaction can have many different inputs and many different outputs, and many different conditions that will have to be med to claim the outputs. Just throwing this all away and simply storing balances of each address would not be compatible with this and would mean we would have to rework the whole concept.
Cool idea though.


Title: Re: A proposal for a scalable blockchain.
Post by: notme on November 25, 2011, 10:06:26 PM
To be clear, merkle pruning only applies to clients that need to conserve space.  You still need the entire chain to mine.


Title: Re: A proposal for a scalable blockchain.
Post by: piuk on November 26, 2011, 11:34:38 AM
I've modified to the original proposal to simplify to process of generating a ledger.

I don't believe the merkel tree pruning proposal by Satoshi is workable. If a client cannot validate blocks then it cannot be trusted to have any network participation.


Title: Re: A proposal for a scalable blockchain.
Post by: ThomasV on November 26, 2011, 11:56:42 AM
interesting


Title: Re: A proposal for a scalable blockchain.
Post by: gmaxwell on November 26, 2011, 12:00:58 PM
however this method still requires storage of all blockheaders, meaning there is still a unlimited cap on the blockchain size.

One hundred years of headers is about 400 mbytes of data.

Why are you wasting our time with this drivel?

Quote
At certain points in time the client generates a snapshot of the of every unspent tx output in the chain.

Superior versions of what you're describing have been suggested before: https://bitcointalk.org/index.php?topic=21995.0

I say superior, because arranging the ledger summary into a hash tree allows nodes to participate without even knowing the complete ledger— other nodes can present ledger fragments to them with the branches which connect the ledger to the tree root— and the whole ledger is constantly current. Bytecoin went further and suggested that you can actually flip the direction of the chain and track the ledgers alone, leaving the coin holders to track the fragments connecting their own coins to the chain.

But none of this is required because of block header sizes.


Title: Re: A proposal for a scalable blockchain.
Post by: gmaxwell on November 26, 2011, 12:09:07 PM
To be clear, merkle pruning only applies to clients that need to conserve space.  You still need the entire chain to mine.

You can prune all the spent outputs and mine away. What you can't do is run a lite headers-only node to mine— unless you just want to mine for the subsidy and not process any transactions. Though with a commitment to open transactions in the prior block, you could mine txn which provide the connecting fragments for the prior block for all of their inputs.


Title: Re: A proposal for a scalable blockchain.
Post by: P4man on November 26, 2011, 12:15:53 PM
Disclaimer: I have no idea what Im talking about.

With that out of the way, would it be possible (or even a good idea) to have all clients store the most recent blocks + a random subset, say 5% of the blockchain? The entire network would still hold countless copies of the blockchain, and as the network (and blockchain) grows you could reduce the subset each client has.


Title: Re: A proposal for a scalable blockchain.
Post by: cbeast on November 26, 2011, 12:23:25 PM
Personally, I'm going to focus on solving the lack of usefulness for bitcoin before worrying about what might happen when I and others work our butts off to get us to the point where this discussion even matters.

+∞

Explosive growth in Bitcoin use = explosive growth in Bitcoin value. Internet speed and storage space are growing exponentially. As Bitcoin grows in value, miners with full clients will be connected to fiber optic networks operating at multi GB speeds and everyone else can use lite clients. There is plenty of time to explore other options.


Title: Re: A proposal for a scalable blockchain.
Post by: Skybuck on November 26, 2011, 02:21:02 PM
Quote
3. Instead of storing the transactions, the balances are stored, which could cut back on data.
This would require quite a rework on the entire bitcoin network.

Not really, the balance chain would be something extra.

It could even start locally for each client without any additional protocol, a balance chain protocol could be added later on.

Quote
Right now it is based on transactions not just for the sake of it, but for flexibility which is achieved with scripts. Basically a transaction can have many different inputs and many different outputs, and many different conditions that will have to be med to claim the outputs. Just throwing this all away and simply storing balances of each address would not be compatible with this and would mean we would have to rework the whole concept.
Cool idea though.

I don't see how it would be incompatible since bit coin addresses start with a balance at 0.

Instead now bit coin addresses can be at an arbitrary value, which is validated by the balance chain, and thus inputs and outputs could be resolved up to that balance value ?!? So instead of solving it to 0, it is solved to balance value.

If you believe otherwise, then please try to explain why this would not work, or where problems/incompatibilities would be.


Title: Re: A proposal for a scalable blockchain.
Post by: cbeast on November 26, 2011, 02:44:36 PM
+∞

Explosive growth in Bitcoin use = explosive growth in Bitcoin value. Internet speed and storage space are growing exponentially. As Bitcoin grows in value, miners with full clients will be connected to fiber optic networks operating at multi GB speeds and everyone else can use lite clients. There is plenty of time to explore other options.

There won't be any explosive growth in use unless big problems like scalability are addressed first.

What is the maximum theoretical bandwidth of fiber optics? Unknown. Scalability problem will be solved through hardware in the short term, Bitcoin core development in the long term.


Title: Re: A proposal for a scalable blockchain.
Post by: Mike Hearn on November 26, 2011, 02:46:50 PM
There won't be any explosive growth in use unless big problems like scalability are addressed first.

If you agree that the current limitations on growth are software performance related (and not exchanges or economy size or whatever) then there are plenty of simpler optimizations that don't require protocol extensions, which still aren't implemented. That'd definitely be the first thing to think about. Gavin is planning to work on the most important of these, but I'm sure he'd appreciate help. Otherwise you could contribute to MultiBit/BitcoinJ which already implement a form of lightweight mode.


Title: Re: A proposal for a scalable blockchain.
Post by: cbeast on November 26, 2011, 03:05:19 PM
What is the maximum theoretical bandwidth of fiber optics? Unknown. Scalability problem will be solved through hardware in the short term, Bitcoin core development in the long term.

If scalability is a non-issue, then why is storing generic data like Namecoin banned/heavily discouraged?

Namecoin is icky  :P

Seriously, I think it won't be in the long run and can be addressed in future core development. For the time being focus should be on useful things.


Title: Re: A proposal for a scalable blockchain.
Post by: Gavin Andresen on November 27, 2011, 06:38:19 PM
If scalability is a non-issue, then why is storing generic data like Namecoin banned/heavily discouraged?

As Mike said, help on "initial headers-only download" would be much appreciated.

Work-in-progress is here:  https://github.com/bitcoin/bitcoin/tree/blockheaders

... and my notes on issues that have to be worked out are here:  https://gist.github.com/1059233

As for scalability in general:  it looks to me like CPU time to validate transactions will be the bottleneck before bandwidth or disk space, so I don't see a strong reason to switching to a 'ledger' or 'balance sheet' method. Effective optimization/scalability is all about identifying and eliminating bottlenecks.

Quote
"Premature optimization is the root of all evil (or at least most of it) in programming." -- Donald Knuth


Title: Re: A proposal for a scalable blockchain.
Post by: piuk on November 27, 2011, 11:04:44 PM
As for scalability in general:  it looks to me like CPU time to validate transactions will be the bottleneck before bandwidth or disk space, so I don't see a strong reason to switching to a 'ledger' or 'balance sheet' method.

It's worth noting that this method would reduce block validation time considerably, something which merkel pruning would not help with. I agree that there are better optimisations that can be implemented first but I still think it might be worth discussing for the future. Can you see any obvious flaws in the proposal Gavin?


Title: Re: A proposal for a scalable blockchain.
Post by: Gavin Andresen on November 28, 2011, 01:16:54 AM
I agree that there are better optimisations that can be implemented first but I still think it might be worth discussing for the future. Can you see any obvious flaws in the proposal Gavin?
Ummm, yes.

It seems to me miners will have an incentive to lie about the transaction ledger, and put fake ledger hashes in their blocks. Either so their transactions might be considered 'unspent' by unsuspecting nodes that trust them, or so that other miners that don't have the full block chain create invalid blocks (eliminate the competition!)

And I don't see a proposal that everybody check the ledger and reject blocks that contain invalid ledger hashes.

I also don't see what the ledger hash accomplishes.  If you're going to trust some other node's version of unspent-transaction-reality, then you could just ask "send me the ledger state before (or after) the block with THIS block hash".

But if you're going to trust one or more nodes anyway... then it seems to me sending an ever-increasing-in-size ledger is a bad way to get scalable. If size-of-full-blockchain becomes a problem before the mining pools and big exchanges/merchants/transactions processors all have transaction processing clusters with a terabyte of ram and petabyte hard drive array then I think extending the protocol to make it easy to request all transactions involved in a given Merkle branch will probably be the way to go.

But before then I expect the bitcoin network will look very different from the way it looks today, and I expect there will be several different solutions for how to scale up. If (when!) Bitcoin gets that successful, there will be serious money hiring the same smart people who figured out how to scale up PayPal and Visa.


Title: Re: A proposal for a scalable blockchain.
Post by: Skybuck on November 28, 2011, 02:42:39 AM
I agree that there are better optimisations that can be implemented first but I still think it might be worth discussing for the future. Can you see any obvious flaws in the proposal Gavin?
Ummm, yes.

It seems to me miners will have an incentive to lie about the transaction ledger, and put fake ledger hashes in their blocks. Either so their transactions might be considered 'unspent' by unsuspecting nodes that trust them, or so that other miners that don't have the full block chain create invalid blocks (eliminate the competition!)

And I don't see a proposal that everybody check the ledger and reject blocks that contain invalid ledger hashes.

I also don't see what the ledger hash accomplishes.  If you're going to trust some other node's version of unspent-transaction-reality, then you could just ask "send me the ledger state before (or after) the block with THIS block hash".

But if you're going to trust one or more nodes anyway... then it seems to me sending an ever-increasing-in-size ledger is a bad way to get scalable. If size-of-full-blockchain becomes a problem before the mining pools and big exchanges/merchants/transactions processors all have transaction processing clusters with a terabyte of ram and petabyte hard drive array then I think extending the protocol to make it easy to request all transactions involved in a given Merkle branch will probably be the way to go.

But before then I expect the bitcoin network will look very different from the way it looks today, and I expect there will be several different solutions for how to scale up. If (when!) Bitcoin gets that successful, there will be serious money hiring the same smart people who figured out how to scale up PayPal and Visa.

This is kinda funny.

So on one hand/side you consider a blockchain safe ?

But on the other hand/side you consider a balancechain to be unsafe ?

You can't have it both way ?!? ;) :)

The only difference so far is that the blockchain is anchored into the application and fully available vs throwing information away, but what remains could be anchored as well so not much difference there ?!?

Though how important such an anchor is remains to be seen ? Perhaps it's not important and can be worked around ?!?

If it is important then maybe a balancechain anchor could be build in as well ?

Either the bitcoin is flawed or it's not, which is it gavin ? ;)


Title: Re: A proposal for a scalable blockchain.
Post by: Skybuck on November 28, 2011, 04:38:55 AM
Nothing is stopping you from trying to make it coherent.

Let me/us know when you figured out how to make it coherent ?!?


Title: Re: A proposal for a scalable blockchain.
Post by: cbeast on November 28, 2011, 11:47:31 AM
Skybuck, Gavin is not saying you should trust anybody. Can you provide a link where Gavin talked about balancechain?

And I don't see a proposal that everybody check the ledger and reject blocks that contain invalid ledger hashes.



Title: Re: A proposal for a scalable blockchain.
Post by: Skybuck on November 28, 2011, 12:38:34 PM
The balance-blockchain (=ledger-blockchain ?) would work the same as the transaction-blockchain.

The concept of a blockchain can be applied to anything.

This includes all the blockchain-mechanisms for rejecting and accepting, etc.

This should not need to be re-explained.

Perhaps the proposal of the original/first poster is weak/insecure or worriesome.

My slightly different proposal is to make a seperate balance-blockchain which should be stronger... assuming the original/first proposal is rejected because of security concerns or incomplete proposal.

I'd like to hear objections or security concerns against my proposal the seperate balance-blockchain.


Title: Re: A proposal for a scalable blockchain.
Post by: cbeast on November 28, 2011, 12:50:05 PM
The balance-blockchain (=ledger-blockchain ?) would work the same as the transaction-blockchain.

The concept of a blockchain can be applied to anything.

This includes all the blockchain-mechanisms for rejecting and accepting, etc.

So why should we trust a balance-blockchain? Should we also have a balancechain for that, and then another for that?

This should not need to be re-explained.

Perhaps the proposal of the original/first poster is weak/insecure or worriesome.

My slightly different proposal is to make a seperate balance-blockchain which should be stronger... assuming the original/first proposal is rejected because of security concerns or incomplete proposal.

I'd like to hear objections or security concerns against my proposal the seperate balance-blockchain.


Humor me. Please explain.


Title: Re: A proposal for a scalable blockchain.
Post by: Skybuck on November 28, 2011, 03:46:29 PM
The balance-blockchain (=ledger-blockchain ?) would work the same as the transaction-blockchain.

The concept of a blockchain can be applied to anything.

This includes all the blockchain-mechanisms for rejecting and accepting, etc.

So why should we trust a balance-blockchain? Should we also have a balancechain for that, and then another for that?


The real question is why not ?

You can change the balance, but it will not be accepted by other clients, because your balance would be a mismatch.

The mismatch can happen in two ways:

1. The traditional way by: length wins, which is what bitcoin does.

2. The voting way: confirmations.

An attacker could create a fake balance chain, which could be longer, and broadcast it to you.

However this should not happen because of majority of processing power, so majority has ultimately longer balance chain.

Unless you are disconnected from the real network and you connect to a fake network, and you cannot connect to real network, same problem as with transaction block chain.

Other attacks could exist too.

The hashes in the balance chain also protect against modifications, make a single modification and you will have to find a hash collision, or re-do an entire fake balance chain which becomes computationally expensive. A balance chain might not even be necessary. Only one balance sheet needs to be stored the previous one. However balance sheets could also function as checkpoints, but that's probably not necessary.

My suggestion was also to make the difficulty 1000x greater than the blockchain, if the blockchain is allowed to be 1000 blocks ahead, if this leads to 1000x hash search times remains to be seen, maybe it will become too difficult, examination of this idea will have to be done.

So the concept of a balance-chain could be dismissed, instead a single balance sheet is stored by all clients.

It could be considered a "sliding balance sheet". It slides along the transactions over time.

However if the concept of a balance chain is dismissed, then the race can no longer happen, nobody wins.

This becomes a problem, there is no winning selection criterium anymore, except perhaps the voting situation. Though that's a bit shakey.

Therefore the idea could be simply:

Store the hashes of the balance chain only.

The balance sheets themselfes are thrown away, except the last one.

At least this makes it possible for people who have all transactions and know exactly when to calculate a balance and a balance hash to check the chain for consistency.

This also preserves the proof-of-work idea.

An attacker would have to re-do those balance chain hashes, computationally expensive.

It's even very interesting, since an attacker might not have the necessary data to make the balance chain.

Therefore attacker is restrained to new data only, less attack surface for him. This would be a lazy attacker, however it's good to assume default intense attackers which have all data.

By storing balance chain hashes only, the old transactions can be thrown away, the old balance sheets can be thrown away, what remains are the balance chain hashes, about 256 bits per hash and a recent balance sheet.


Title: Re: A proposal for a scalable blockchain.
Post by: Skybuck on November 28, 2011, 03:53:03 PM
Perhaps some terms are confusing.

A visualization of the idea:

Balance Block 0 (Anchor hardcoded into application)
+---------------------------------+
|Genesis Balance Sheet Hash   |
+---------------------------------+

Balance Block 1
+---------------------------------+
| Previous Balance Sheet Hash |
+---------------------------------+

Balance Block 2
+---------------------------------+
| Previous Balance Sheet Hash |
+---------------------------------+

Balance Block 3
+---------------------------------+
| Previous Balance Sheet Hash |
+---------------------------------+

Balance Block 4
+---------------------------------+
| Previous Balance Sheet Hash |
+---------------------------------+

Current Balance Sheet
+------------------------------------+
|  Bitcoin Address | Balance Value |
+------------------------------------+
|  Bitcoin Address | Balance Value |
+------------------------------------+
|  Bitcoin Address | Balance Value |
+------------------------------------+
|  Bitcoin Address | Balance Value |
+------------------------------------+
|  Bitcoin Address | Balance Value |
+------------------------------------+
|  Bitcoin Address | Balance Value |
+------------------------------------+
|  Bitcoin Address | Balance Value |
+------------------------------------+
|  Bitcoin Address | Balance Value |
+------------------------------------+
|  Bitcoin Address | Balance Value |
+------------------------------------+
^
All known bitcoin addresses, possibly only those with balance value above zero.

Clients only need the most recent balance block to participate. Older balance blocks can be used to check historic data.

(The transaction block chain keeps existing to stabilize the balance sheet, which will be considered stable after 1000 or 2000 blocks, the transaction block chain can later be thrown away up to the most recent/current balance sheet, or it can be stored for checking/verification purposes at the expensive of more harddisk usage, up to user/application to decide.)


Title: Re: A proposal for a scalable blockchain.
Post by: piuk on November 30, 2011, 04:17:17 PM
I agree that there are better optimisations that can be implemented first but I still think it might be worth discussing for the future. Can you see any obvious flaws in the proposal Gavin?
Ummm, yes.

It seems to me miners will have an incentive to lie about the transaction ledger, and put fake ledger hashes in their blocks. Either so their transactions might be considered 'unspent' by unsuspecting nodes that trust them, or so that other miners that don't have the full block chain create invalid blocks (eliminate the competition!)

And I don't see a proposal that everybody check the ledger and reject blocks that contain invalid ledger hashes.

I also don't see what the ledger hash accomplishes.  If you're going to trust some other node's version of unspent-transaction-reality, then you could just ask "send me the ledger state before (or after) the block with THIS block hash".

But if you're going to trust one or more nodes anyway... then it seems to me sending an ever-increasing-in-size ledger is a bad way to get scalable. If size-of-full-blockchain becomes a problem before the mining pools and big exchanges/merchants/transactions processors all have transaction processing clusters with a terabyte of ram and petabyte hard drive array then I think extending the protocol to make it easy to request all transactions involved in a given Merkle branch will probably be the way to go.

But before then I expect the bitcoin network will look very different from the way it looks today, and I expect there will be several different solutions for how to scale up. If (when!) Bitcoin gets that successful, there will be serious money hiring the same smart people who figured out how to scale up PayPal and Visa.

It would be much simpler if enforcing a correct ledger hash could be done, but it would make adoption much more difficult as it would require 50% of all miners to upgrade at once.

A solution this problem would be to have miners include the hash of previous ledger and block height they believe to be correct. During the initial blockchain download the client would continue to download blocks until there at least one ledger can be agreed upon. Some kind of decaying time window would need to be implemented so if the majority of hashing power is agreed on one "ledger chain" a minority of clients cannot force a full blockchain download.

You don't have trust any node's version of unspent reality, you have to trust 50% of the hashing power's version of unspent reality - something which you kind of have to do anyway. Although the the consequences for a malicious entity gaining 50% hashing power would be more severe (though they would need to have 50% for a long time).

Skybuck: I'm not sure were on the same page. I wasn't really talking about a separate balance chain. The problem with a chain is its a chain so grows indefinatly thus block validation time, disk space, bandwidth requirements all grow indefinitely. At some point you should be able to look into the past and  say these transactions are no longer under contention and be able to archive/compress them.


Title: Re: A proposal for a scalable blockchain.
Post by: Gavin Andresen on November 30, 2011, 04:29:59 PM
A solution this problem would be to have miners include the hash of previous ledger and block height they believe to be correct.
So go implement it and see how well it works.

Create a little HTTP-based protocol with, oh, three methods:
  • You send a block height or block hash, you get back a ledger hash.
  • You send a ledger hash, you get back the full ledger or an "I have no idea what you're talking about, that's not a valid ledger hash".
  • You send two ledger hashes, you get back the changes from one to the other or an "I have no idea what you're talking about, one of those isn't a valid ledger hash".

Then you just need to convince a bunch of semi-trustworthy people to run "ledger servers." And maybe have some mechanism for reporting when a ledger server has a bug or 'goes rogue' and reports a ledger hash that is different from everybody else.

Oh, and you might need to solve the incentive problem of "why would I run a ledger server if I'm not paid for it" (and maybe write a bunch of denial-of-service-prevention code in case some jerk with a botnet decides to ask for 10,000 full ledgers from 10,000 different IP addresses).


Title: Re: A proposal for a scalable blockchain.
Post by: Maged on November 30, 2011, 05:37:32 PM
The most obvious way to prevent this from being exploited is by incorporating it directly into the protocol as a requirement for block acceptance. Of course, this would initially require 50% of the miners in order to work, but that's standard procedure for adding new requirements to the protocol. It also might only need to be included in certain blocks, such as every difficulty change. However, that might make miners lazy and just not check the hash when it comes up, so it'd have to be included fairly commonly - likely every block. As for how it'd be checked, that's simple: it'd be independently calculated by everyone.

That being said, I don't really see what this would accomplish. The difference between this and block pruning would likely be small. People bootstrapping with this ledger still would need to at least check the block headers all the way back to the last checkpoint in order to verify that they are actually on the main chain and not a fake one that is only a few thousand blocks long that isn't actually attached to the main chain. The savings just aren't enough to justify this large overhead.

If you want to prove me wrong, go and calculate the savings that this would currently provide, along with the savings of block pruning. Bonus points if you model out, simulate, and calculate this on projected future growth.

There are only two major benefits I see this ledger proposal:
1) A client doesn't need to hold on to the merkle branch of all their unspent transactions to ensure that they will always be able to spend their coins, even if the miners don't (easily) have access to old merkle trees.
2) It is not possible for clients to provide over-pruned merkle trees (where they prune even unspent transactions) to new clients attempting to bootstrap.

On the plus side, this could be used as a low-certainty check (where this hash isn't require to be in blocks, nor is it required to be checked) so that new clients know that they have all of the unspent transactions and didn't receive over-pruned blocks. But at that point, it might just be easier to request this hash outside of the chain from trusted peers/multiple untrusted peers.


Title: Re: A proposal for a scalable blockchain.
Post by: piuk on November 30, 2011, 08:45:52 PM
If you want to prove me wrong, go and calculate the savings that this would currently provide, along with the savings of block pruning. Bonus points if you model out, simulate, and calculate this on projected future growth.

I wrote some test code to produce a vector of all unspent outputs (Basically the ledger).

Quote
#include <list>
#include <set>
#include <utility>
template < typename KeyType, typename MappedType,
typename Comp = std::less< KeyType > >
struct linked_map {
    typedef KeyType key_type;
    typedef MappedType mapped_type;
    typedef std::pair< const key_type, mapped_type > value_type;
private:
    typedef std::list< value_type >      list_type;
    typedef typename list_type::iterator list_iterator;
    struct compare_keys {
        Comp the_order;
        compare_keys ( Comp o )
        : the_order ( o )
        {}
        bool operator() ( list_iterator lhs, list_iterator rhs ) const {
            return ( the_order( lhs->first, rhs->first ) );
        }
    };
    typedef std::set< list_iterator, compare_keys > set_type;
    typedef typename set_type::iterator             set_iterator;
    list_type the_list;
    set_type  the_set;
public:
    typedef list_iterator iterator;
    typedef typename set_type::size_type size_type;
    linked_map ( Comp o = Comp() )
    : the_list()
    , the_set ( compare_keys( o ) )
    {}
    iterator find ( key_type const & key ) {
        value_type dummy_value ( key, mapped_type() );
        list_type  dummy_list;
        dummy_list.push_back( dummy_value );
        set_iterator where = the_set.find( dummy_list.begin() );
        if ( where == the_set.end() ) {
            return ( the_list.end() );
        }
        return ( *where );
    }
    iterator insert ( value_type const & value ) {
        list_type dummy;
        dummy.push_back( value );
        set_iterator where = the_set.find( dummy.begin() );
        if ( where == the_set.end() ) {
            the_list.push_back( value );
            list_iterator pos = the_list.end();
            -- pos;
            the_set.insert( pos );
            return ( pos );
        } else {
            (*where)->second = value.second;
            return ( *where );
        }
    }
    iterator erase ( iterator where ) {
        the_set.erase( where );
        return ( the_list.erase( where ) );
    }
    iterator begin ( void ) {
        return ( the_list.begin() );
    }
    iterator end ( void ) {
        return ( the_list.end() );
    }
    size_type size ( void ) const {
        return ( the_set.size() );
    }
    mapped_type & operator[] ( key_type const & key ) {
        iterator pos = insert( std::make_pair( key, mapped_type() ) );
        return ( pos->second );
    }
};

void buildTestLedger() {
    
    cout << "Building Test Ledger" << endl;
    
    float start = time(NULL);

    vector<pair<int, CBlockIndex*> > vSortedByHeight;
    vSortedByHeight.reserve(mapBlockIndex.size());
    BOOST_FOREACH(const PAIRTYPE(uint256, CBlockIndex*)& item, mapBlockIndex)
    {
        CBlockIndex* pindex = item.second;
        vSortedByHeight.push_back(make_pair(pindex->nHeight, pindex));
    }
    sort(vSortedByHeight.begin(), vSortedByHeight.end());
    
    linked_map< COutPoint, CTxOut > unspent;
    
    long originalSize = 0;
    
    BOOST_FOREACH(const PAIRTYPE(int, CBlockIndex*)& item, vSortedByHeight)
    {
        CBlockIndex* pindex = item.second;

        CBlock block;
        
        block.ReadFromDisk(pindex);
      
        originalSize += GetSerializeSize(block, SER_DISK);
        
        BOOST_FOREACH(CTransaction & tx, block.vtx) {
            
            //Check each input and remove spent
            BOOST_FOREACH(CTxIn & in, tx.vin) {
               linked_map< COutPoint, CTxOut >::iterator it = unspent.find(in.prevout);
                
                if (it != unspent.end()) {
                    unspent.erase(it);
                }
            }
            
            int ii = 0;
            
            //Add each output to unspent
            BOOST_FOREACH(CTxOut & out, tx.vout) {
                COutPoint point(tx.GetHash(), ii);
                
                linked_map<COutPoint, CTxOut>::iterator it = unspent.insert(make_pair(point, out));
                
                ++ii;
            }
        }
    }
        
    //Here you would write the ledger to disk
    
    float end = time(NULL);
    
    long ledgerSize = unspent.size() * (sizeof(COutPoint) + sizeof(CTxOut));

    cout << "Ledger generation took " << end - start << "s" << endl;

    cout << "Original disk size " << originalSize / 1024.f / 1024.f << "MB" << endl;

    cout << "Number of unspent outputs " << unspent.size() <<  " Ledger size " << ledgerSize / 1024.f / 1024.f << "MB" << endl;

    cout << (ledgerSize / (double)originalSize) * 100 << "% of the original size" << endl;
}


Sample output:

Quote
Building Test Ledger
Ledger generation took 128s
Original disk size 748.074MB
Number of unspent outputs 1212159 Ledger size 78.6083MB
10.5081% of the original size

+ You would need to hold a certain number of recent blocks, at least until all chain forks can be resolved or 2016 to calculate the difficulty target. Your probably looking at 85% reduction in disk size and the same in block validation time. I wouldn't even know where to begin calculating an for estimate for merkel pruning would you have to download the full merkel trees?

Edit: Come to think about it, assuming the requirement of a correct balance ledger was enforced you don't even need the latest full blocks. You could take the latest balance ledger from any chain head and download the block headers only to verify the proof of work. That way your looking at more than 85% reduction in chain size.


Title: Re: A proposal for a scalable blockchain.
Post by: ByteCoin on December 01, 2011, 02:30:15 AM
Building Test Ledger
...
10.5081% of the original size

Splendid! Would you be so kind as to rerun the calculations while discarding TxOuts with small BTC values, please?
Let's say anything below 0.001 can be discarded.
If you're feeling keen could you please plot a cumulative distribution of TxOut value? Possibly on a log value scale?
I have a strong feeling that the vast majority of the ledger will be taken up with very small TxOuts, many of them vanity addresses.

The fact that a ledger system seems to result in reclaiming 90% of the disc space is encouraging. Now that many bitcoins have been exchanged, the space saving results of a ledger system are more obvious than when I first proposed it over a year ago (https://bitcointalk.org/index.php?topic=505.msg18611#msg18611)

The fees system could be reworked to provide incentives for keeping the ledger small rather than the blockchain. Some transactions would result in ledger size decreasing and could perhaps be encouraged by refunding some previously paid fee.

As Gavin implies, all nodes would have to verify ledger integrity in the same way that block integrity is verified. If this were implemented, a couple of extra optimizations should accompany: the transaction signature shouldn't contribute to the hash and the signature should not be recorded in the block or ledger. This would result in further considerable space savings and pave the way for enabling transaction replacement and other advanced contract features.

Note that if the ledger is implemented as a hash tree and incoming transactions are incorporated into the tree according to a suitable algorithm then when a new block is found, each client can recalculate the ledger independently and hence the whole ledger need only be downloaded once.

ByteCoin


Title: Re: A proposal for a scalable blockchain.
Post by: finway on December 01, 2011, 02:52:27 AM
Scalability is a problem. Any proposal should be welcome.  :-[


Title: Re: A proposal for a scalable blockchain.
Post by: Gavin Andresen on December 01, 2011, 03:01:12 AM
Splendid! Would you be so kind as to rerun the calculations while discarding TxOuts with small BTC values, please?
Let's say anything below 0.001 can be discarded.

Interesting idea!

Thinking out loud...  There Doesn't Have To Be One Way To Do It.  Piuk (or somebody) should hack together a client or miner that uses a ledger system; it could refuse to relay or include any transactions with inputs smaller than 0.001 BTC, so it only needed a truncated ledger to create new blocks (if it is a miner, maybe it connects to a trusted 'traditional' bitcoin node to make sure it only builds on valid blocks which might contain tiny inputs).



Title: Re: A proposal for a scalable blockchain.
Post by: ByteCoin on December 01, 2011, 03:13:42 AM
Interesting idea!

somebody should hack together a client or miner that uses a ledger system; it could refuse to relay or include any transactions with inputs smaller than 0.001 BTC, so it only needed a truncated ledger to create new blocks

Quite so. If the transactions were sorted in the merkle tree according to value, then a client (or miner) not interested in fiddling small change could stub off whole chunks of the tree. I agree that "There Doesn't Have To Be One Way To Do It" but one has to be careful - going too far down that route can result in TxOuts being shunned. If a majority of miners don't relay and don't accept transactions using the tiny TxOuts then such transactions take a long time to confirm - if at all.

I'm in two minds as to whether this scenario is natural optimisation at work or a bad thing.

On the other hand, similar pressures exist under the current system if block chain pruning were implemented. There would be a temptation to prune small unspent transactions to free up disk space.

ByteCoin


Title: Re: A proposal for a scalable blockchain.
Post by: Atheros on December 01, 2011, 04:11:42 AM
On the other hand, similar pressures exist under the current system if block chain pruning were implemented. There would be a temptation to prune small unspent transactions to free up disk space.
If a miner did this, how would they verify all transactions? And if they don't verify transactions, how will they know what to include in blocks? If they include bad transactions in blocks then the block is bad and they just wasted a whole lot of time and electricity.


Title: Re: A proposal for a scalable blockchain.
Post by: Skybuck on December 01, 2011, 12:58:57 PM
This is against the idea of bitcoin.

Since there is a limited supply of coins, at least in theory, some day a bitcoin could be worth 1000 dollars.

Thus 0.001 could be worth 1 dollar.

Do you really want to throw away dollars like that ? I don't think so ;) :)


Title: Re: A proposal for a scalable blockchain.
Post by: piuk on December 01, 2011, 03:43:55 PM
Splendid! Would you be so kind as to rerun the calculations while discarding TxOuts with small BTC values, please?
Let's say anything below 0.001 can be discarded.
If you're feeling keen could you please plot a cumulative distribution of TxOut value? Possibly on a log value scale?
I have a strong feeling that the vast majority of the ledger will be taken up with very small TxOuts, many of them vanity addresses.

The fact that a ledger system seems to result in reclaiming 90% of the disc space is encouraging. Now that many bitcoins have been exchanged, the space saving results of a ledger system are more obvious than when I first proposed it over a year ago (https://bitcointalk.org/index.php?topic=505.msg18611#msg18611)

The fees system could be reworked to provide incentives for keeping the ledger small rather than the blockchain. Some transactions would result in ledger size decreasing and could perhaps be encouraged by refunding some previously paid fee.

As Gavin implies, all nodes would have to verify ledger integrity in the same way that block integrity is verified. If this were implemented, a couple of extra optimizations should accompany: the transaction signature shouldn't contribute to the hash and the signature should not be recorded in the block or ledger. This would result in further considerable space savings and pave the way for enabling transaction replacement and other advanced contract features.

Note that if the ledger is implemented as a hash tree and incoming transactions are incorporated into the tree according to a suitable algorithm then when a new block is found, each client can recalculate the ledger independently and hence the whole ledger need only be downloaded once.

ByteCoin

Here's the output filtering all output size <= 0.01BTC

Quote
Building Test Ledger
Ledger generation took 128s
Original disk size 750.153MB
Number of unspent outputs 733392 Ledger size 50.6512MB
6.75211% of the original size

To be honest I would have thought the saving would be greater. Regardless unless the official position is that bitcoin won't support transactions under 0.01BTC you couldn't filter them. The majority of space is taken up by transaction hashes and scripts so if there was a good way to compress these then you could reduce the size equally as well.

One possible scheme would be:
Order the ledger by transaction hash in ascending order. The hash is recorded as a var_int value between the difference of the previous hash.
Scripts are then stored at the end of the ledger, with duplicate scripts being written only once. The CTxOut then points to an script index instead of having the script written inline.

I will play around with some different formats.

Adjusting fees based on the ledger size effect is an excellent idea.




Title: Re: A proposal for a scalable blockchain.
Post by: finway on December 06, 2011, 09:44:03 AM
Gavin says CPU is the bottleneck right now,
recently i downloaded the whole blockchain again (it takes me almost whole day), and i see the harddisk light shining all along, and the CPU usage is 20%~30%.

Maybe some blk0001.dat reorganize will help?



Title: Re: A proposal for a scalable blockchain.
Post by: gmaxwell on December 07, 2011, 02:24:31 PM
If a miner did this, how would they verify all transactions? And if they don't verify transactions, how will they know what to include in blocks? If they include bad transactions in blocks then the block is bad and they just wasted a whole lot of time and electricity.

They don't include transactions they can't verify.  Their only risk is that someone else wasted a lot of time and electricity in the prior block they received to make them waste a lot of time and electricity.

It's also possible to do the opposite of what bytecoin suggests e.g. _don't_ sort the tree by the sizes, sort it by txn hash, so that you can't escape the space requirement of small transactions easily. (if you prune them out of the ledger you'll have to keep their hashes in order to be able to update the tree— so the space savings isn't great).   

In any case if the ledger is committed the pruning is less bad because someone who wants to spend a pruned input could provide the tree fragment themselves... though it's not clear to me how a miner could add new outputs to a size sorted tree for a size they weren't maintaining.

Another random point is that where the ledger is deterministic the bigger issue is attackers intentionally creating unbalanced trees, so you'd have to penalize txn outs which would attach at ugly (too deep) points.



Title: Re: A proposal for a scalable blockchain.
Post by: etotheipi on May 07, 2012, 01:18:03 AM
Has anyone considered this idea using address/P2SH hashes as the "key" of the tree instead of TxOuts/OutPoints?   By doing this, lite-weight nodes could retrieve the entire unspent-TxOut-list of an address by downloading only a couple kilobytes but still have the same security of a full-node!     

Like the OP's generic merkle-tree-of-unspent-TxOuts, this would provide the same level of compression but additionally enable lightweight nodes to verify address balances within seconds.  That's because the leaf nodes of the ledger tree are actually lists of unspent outputs, instead of individual outputs (organized in sub-merkle-trees).

For convenience, I will use the vague term "recipient" to refer to the pub-key-hash160, or a P2SH script.  I'm sure a scheme could be created that takes into account arbitrary TxOut scripts and still have benefit of uniform distribution.  Here's how the "ledger" is created on a given block:

  • All nodes traverse the blockchain and collect unspent outputs (can be done on an already pruned tree to be updated)
  • The unspent outputs are sorted by their recipient, which can be done in linear time using bucket-sort (because the sorting keys are hashes which are uniformly distributed)
  • For each recipient all of its unspent outputs are put into a merkle sub-tree.
  • The sub-tree root for each individual recipient is put into a master merkle tree
  • The fingerprint of the blockchain on a given block is the master-merkle-tree root.

Therefore, each address/recipient is one leaf node of the master merkle tree, and the value of each leaf node is the merkle root of the unspent-TxOut-list for that recipieint.

There's plenty of discussion already about how this fingerprint is distributed, so we assume that all full nodes have it and agree on it for each recipient.  So now you are a lite-weight node getting onto the network for the first time.  You have a list of addresses ("recipients") to collect balances for.  

  • You download the headers (80 bytes each) plus the ledger-fingerprint at each block (32 bytes each).  Compute the longest chain.
  • For each address, you query your peers with the recipient string.
  • Each peer responds with the sub-merkle-root of that recipient, along with the entire merkle branch -- which should be 2*log(N) hashes, where N is the number of "recipients" in the blockchain with balance > 0.  If the blockchain has 100,000,000 addresses, this is about 2 kB worth of data.
  • You verify the merkle branch against the ledger fingerprint for this block.  You now know that the sub-merkle-root is valid.
  • You request the sub-merkle tree from your peers (which is just the unspent TxOut list for that address).
  • You compute the merkle-root of the unspent TxOuts and verifies it matches the sub-merkle-root previously verified.

As long as the ledger-fingerprint (which is just an enormous-merkle-tree root) is somehow included in the protocol, then lite-clients could import addresses and verify balances for only a couple kilobytes and with the same security as downloading the whole blockchain!  

Oh, and you'd get the benefit of blockchain pruning, too.  A small bonus...


Title: Re: A proposal for a scalable blockchain.
Post by: cbeast on May 07, 2012, 03:22:35 AM
Has anyone considered this idea using address/P2SH hashes as the "key" of the tree instead of TxOuts/OutPoints?   By doing this, lite-weight nodes could retrieve the entire unspent-TxOut-list of an address by downloading only a couple kilobytes but still have the same security of a full-node!   
So now you are a lite-weight node getting onto the network for the first time.
Oh, and you'd get the benefit of blockchain pruning, too.  A small bonus...

Someone had an epiphany.


Title: Re: A proposal for a scalable blockchain.
Post by: gmaxwell on May 07, 2012, 03:29:30 AM
Has anyone considered this idea using address/P2SH hashes as the "key" of the tree instead of TxOuts/OutPoints?

Then I arrange a bunch of transactions under that key and unbalance your tree. Updates become O(N). Yuck.   (And bumping zombie threads? double yuck!)


Title: Re: A proposal for a scalable blockchain.
Post by: BrightAnarchist on May 07, 2012, 03:32:40 AM
watching


Title: Re: A proposal for a scalable blockchain.
Post by: etotheipi on May 07, 2012, 08:21:55 PM
Then I arrange a bunch of transactions under that key and unbalance your tree. Updates become O(N). Yuck.

I don't follow.  Or rather, I don't know why this is any more computationally inconvenient than a pure unspent-txout-list-merkle-tree.  Much of the tree would probably have to be recomputed anyway when you have hundreds of transactions removing and adding branches to the tree.  The only difference is that your main tree now only holds all "recipients" in stead of all unspent outputs, and each recipient has its own sub-tree of unspent outputs.   Either way, you have N unspent outputs that have to be represented in the tree, and updating 300 of them on a new block still requires substantial recomputations.  

In this case, each op requires updating a subtree and then the master tree after that.  If all unspent outputs were conentrated behind a single recipient, then we're back to the original proposal of unspent-TxOut-list trees, which was the basis for this thread.

Unless your point has to do with the general feasibility of maintaining such a tree.  Then it's a fair point, but I'm simply building on the original proposal (and its flaws).  

If one argument against it was that the 90% space savings isn't worth the tremendous complexity changes to the client and protocol, then my question is "is it worth revisiting it knowing that a change of similar complexity could save you 90% and give nodes the ability to verify balances without downloading the whole tree?"


(And bumping zombie threads? double yuck!)

Would you have preferred I started a new thread?  Then you could complain about the dozens of threads already out there discussing this general topic.  Plus, this isn't a "bump"... it's a legitimate expansion of the exact, original idea proposed in this thread.





Title: Re: A proposal for a scalable blockchain.
Post by: piuk on June 15, 2012, 11:51:27 AM
Up to date test re-run:

Quote
Building Test Ledger
Ledger generation took 128s
Number of unspent outputs 1053033
Original disk size 1716.7MB
Compressed Ledger size 71.9706MB
4.19238% of the original size

Since the last test the blockchain has more than doubled in size but the ledger has only increased by 20MB.

In my opinion this is the only way that bitcoin will scale. Merkel tree pruning is a dead end because not even the centralised nodes holding the full blockchain will be able to scale.

Keep variable number of blocks (Default ~2 weeks) unspent outputs in blocks older than that are compressed into a ledger.


Title: Re: A proposal for a scalable blockchain.
Post by: Realpra on June 15, 2012, 03:08:19 PM
I like this proposal and I think it would work.

However to avoid miners printing money I think the snapshots should be larger than 2 weeks: If someone had 51% for two weeks (not THAT hard to imagine) he could basically rewrite ledger history.

Maybe make it 5 YEARS: Keeps the chain from becoming infinite, but is still quite safe + most inputs before such date should have been spent, so the ledger becomes smaller too.

Win-win.


Anyone from this thread like my swarm proposal? (other thread)