Distributed TX Lists and TX flooding defense.

etotheipi (OP)

Legendary

Offline

Activity: 1428
Merit: 1093

Core Armory Developer

Distributed TX Lists and TX flooding defense.

July 04, 2011, 10:42:20 PM

#1

I noticed last time I started a new client-install from scratch that it took me more than a full day to get up to the current block. I have also observed over the last few weeks, that the daily "Bitcoins Sent" on bitcoinwatch.com is consistently >1million. It's tough to believe that the BTC community is so active, that more than 20% of the global BTC changes hands every day. I suspect there are some people that recognize they can add a ton of bloat to the network by self-juggling transactions between their own addresses. My understanding is, if they make sure all transactions only have 1 input and 1 output, they won't have to pay a transaction fee. Is this correct? Regardless, it's pretty inconvenient to acquire the entire block-chain from the BTC network. I want to make sure I understand this, because I want to build a client that gracefully avoids being bloated like this:

(1) Right now, there's only about 11 MB worth of block headers (135,000 blocks x 80 bytes/block), which should take a very short amount of time to download, and mere seconds to verify hash-integrity of the entire blockchain [headers]. However, the global transaction list is considerably larger, and currently downloaded in its entirety by the official BTC client, all the time.
(2) The block headers give no information about what transactions were included in the block, only the Merkle Tree root (a hash) of the transaction list for that block. Then, if the client wants to know and verify a BTC address balance (without anyone elses' help), he has to download the transaction list for every block, at least back to the original coinbase transactions of all the coins contained in the address.
(3) You don't need the entire block chain to send/receive BTC, you only need the block headers. A client can create transaction messages and sign them without transaction lists. It couldn't verify whether the transaction was valid, but the transaction will be rejected by the network anyway, if it's not valid.
(4) Only miners would need the entire block chain, so that they can successfully, and quickly, verify the transactions they are trying to include in their blocks.
(5) A client that receives only the headers (assuming it's the right chain), doesn't have to trust the other nodes around him so much when requesting data. For instance, if he requests and receives the transaction list for only block X, he can quickly construct and verify the merkle tree against the block X header. He can't verify the entire history, but presumably, if he has the longest/correct blockchain, those transactions must've been valid to be included in a block.

Are (1)-(4) correct assumptions? Perhaps the transaction validity assumption in (5) is weak, since he could've been fed fake tx lists and headers by a dishonest node trying to trick him, knowing he won't follow the transaction history. Though, any dishonest node would have to have a ton of power to even produce a single bogus block, and surely, the client would get word of the longer/correct blockchain within seconds or minutes.

So first of all, is it possible to package up the transaction lists into something like bit-torrent file and distribute them as one giant chunk of compressed data? The first 134,000 blocks aren't going to change, so why require the BTC nodes/network to fill all the block-data requests? Bit-torrent is designed to handle this. And if I'm not mistaken, someone acquiring 1 GB of block data should be able to download that in like 10 minutes, and verify the entire set in less than 10 minutes on a modern computer. This would require bit-torrent protocol to be included in the BTC client, but seems it would be worth it. Package up every 5000 blocks into a new btc_Blk0_to_Blk1.torrent file and let the block-chain be distributed that way. The BTC network would only have to handle requests for the most-recent blocks.

Second, would it be possible to implement a new kind of message on the BTC network that allows lightweight-clients to leverage heavyweight clients to provide only the relevant transaction history, if they don't want the whole block chain? The lightweight client knows that address X is mine, and wants to verify the balance and integrity of the address. So he sends out this special request for address X, and a client on the network that has the full chain can send the a list of block numbers/hashes that are relevant to that address. Then, the lightweight client only needs to request that list of full-blocks from the network. Again, since he has the "correct" blockchain headers, there should be no problem trusting arbitrary nodes to give him the right block listing. The client can continue to download every new block as it is broadcast, but will discard it if it's not relevant to himself.

I want to extend this discussion, but this message is long enough already! Maybe I'll stop there until I know my assumptions are correct.
-Eto

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here! (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)

deepceleron

Legendary

Offline

Activity: 1512
Merit: 1028

Re: Distributed TX Lists and TX flooding defense.

July 05, 2011, 01:32:18 AM

#2

or just include starter blocks with the client: http://sourceforge.net/projects/bitcoin/files/Bitcoin/blockchain/

etotheipi (OP)

Legendary

Offline

Activity: 1428
Merit: 1093

Core Armory Developer

Re: Distributed TX Lists and TX flooding defense.

July 05, 2011, 01:50:27 AM

#3

Well that's a start, if your computer that is going to hold the entire block chain. I assume sourceforge can handle the download stress, even if this was linked from the main Bitcoin website. On the other hand, for such a huge amount of data that should be the same everywhere, the client could P2P download it in minutes from the other nodes with no stress on anyone's system.

As for the lightweight clients, is my understanding correct?

(1) You could get by with nothing but the headers, and the set of blocks that are "relevant" to your wallet
(2) It would not be possible to know which blocks are relevant to you, without first downloading them
(3) A new message could be added that allows one to request a list of blocks for a specific address, and nodes could handle the requests efficiently
(4) You would only need as many blocks as necessary to find the coinbase transactions of all inputs ever sent to your addresses

-Eto

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here! (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)

gmaxwell

Moderator
Legendary

Offline

Activity: 4158
Merit: 8382

Re: Distributed TX Lists and TX flooding defense.

July 05, 2011, 09:54:19 PM

#4

Quote from: etotheipi on July 04, 2011, 10:42:20 PM

I noticed last time I started a new client-install from scratch that it took me more than a full day to get up to the current block.

FWIW, this is mostly likely due to the flood protection logic which has recently been fixed in git. With prior code nodes you were pulling the chain from would disconnect you for flooding after they decided to send you too much data in response to your getblock requests.

You could potentially get disconnected from every clueful node you attempted until you ended up connected to a bunch of nodes which were as ignorant as yours.

With the new change a sync-up which took >8 hours before now finishes in about 35 minutes for me, most of that spent cpu-bound validating the data. Unfortunately the benefit won't be realized until the bulk of the network upgrades to the not yet out .24.

Of course, lite clients are a good and planned thing, but the current poor performance is mostly due to bugs and missing features— not the lack of a lite client.

etotheipi (OP)

Legendary

Offline

Activity: 1428
Merit: 1093

Core Armory Developer

Re: Distributed TX Lists and TX flooding defense.

July 05, 2011, 10:58:54 PM

#5

Okay, that's good to know. I assumed it had to do with the BTC network not being ideal for moving large amounts of data, and recognizing that there's an existing decentralized protocol for doing so. If the new client only takes a couple hours to update the block chain, then bringing bit-torrent into the mix is not going to provide a whole lot of extra benefit.

So let me switch to the other side of this question. Given that I want to design a lite-client, I need to know what is the minimal amount of information from the blockchain that can be used to efficiently process transactions. It sounds like the entire array of block-headers is not only really useful, but also space-efficient. The actual transaction lists themselves are not so nice. My understanding is that there is not currently a protocol for getting specific blocks in the manner described below:

Message Type: Address chain
Payload: block hash/number + relevant address

Response:
Payload: Up to 1000 block hashes/numbers, that represent the 1000 blocks before the supplied block, that are relevant to the supplied address. This would require the responding node to follow the transaction chain(s) for that particular address and accumulate blocks until it reaches 1000. Then the client can issue a new request for each dangling transaction that still has to be followed to the original coinbase transactions. Finally, the client could use that list to collect all the blocks he needs and discard the rest.

1) Is this sufficient for balance/transaction verification? Or is it "safe" to just accumulate the "latest" transactions which provide the balance information necessary?
2) Is this likely to be a ton of data, anyway? It seems all you would need is for ANY input address along the way to have a long/active history, and you'd end up having to download 10%+ of the blocks for your own address
3) Since there can be multiple inputs for a particular transaction, there may be a bit of branching, and I'm not sure exactly which blocks the responding node would include in his list of 1000.

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here! (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)

TierNolan

Legendary

Offline

Activity: 1232
Merit: 1083

Re: Distributed TX Lists and TX flooding defense.

July 05, 2011, 11:39:54 PM

#6

Quote from: etotheipi on July 05, 2011, 10:58:54 PM

My understanding is that there is not currently a protocol for getting specific blocks in the manner described below:

You can't trace backwards. There was a suggestion somewhere to allow filtering of block data. You could ask for all blocks, but only include certain transactions.

One issue is that you would have to trust the other node. The only way to be sure that there was no double spend is to verify all blocks since the coin was last used.

For download, the fastest plan would be to download the entire chain and then work backwards filling the chain out. The latest blocks are likely the most important anyway, especially for new users.

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF

etotheipi (OP)

Legendary

Offline

Activity: 1428
Merit: 1093

Core Armory Developer

Re: Distributed TX Lists and TX flooding defense.

July 06, 2011, 01:07:39 AM

#7

I thought each transaction had a list of output transactions from previous blocks as inputs. Perhaps I don't fully understand this phenomenon. I'll look at the specification for TxIn and TxOut a little closer.

But what I'm proposing doesn't require picking out double-spends. Tell me otherwise, but I'm pretty sure only the nodes that are mining need to have the whole blockchain, so they can verify there is no double-spending. The only thing I want the lite-client on my phone to do is check our balance. And as long as the merkle tree roots of the transactions provided by other nodes are consistent with the block headers, this should be alright.

Or am I missing something? Perhaps the client just downloads all the blocks in batches of 1000, and only keeps the ones it needs. If it doesn't take 3 days to download the blockchain anymore, this might be okay.

-Eto

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here! (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)

etotheipi (OP)

Legendary

Offline

Activity: 1428
Merit: 1093

Core Armory Developer

Re: Distributed TX Lists and TX flooding defense.

July 06, 2011, 03:39:37 AM

#8

I want to create an Android app that allows you to confirm receipt of your BTC, but doesn't have to hold the entire block chain. My understanding is that if it contains the block headers, it can trust any single block + transaction-list provided by an arbitrary node, because the merkle root would have to match the block header root, the block header contains the hash of the previous block, and the block header's hash is a very compute-intensive value. It is extremely difficult to create a fake one.

The only way any dishonest node to deceive you would be to calculate a valid block by itself and try to funnel it to you alone right before you accept the transaction. Perhaps they want to double-spend against you. They have to create a fake block with a tx from any arbitrary account to their input account, then your client believes their input account has the money and the transaction is ultimately valid. But then, the attacker never releases the block to the network, and the next real block computed doesn't include the transaction.

The problem is, calculating that block by the attacker would be exceptionally difficult, the window in which they have to execute the attack would be tiny (30s-10min) since a new block from the network would invalidate it, and they could get 50 BTC by just broadcasting the valid block. Therefore, for a transaction less than 50 BTC, it doesn't even make sense for an attacker to try to deceive you. In all likelihood, you won't be dealing with 50+ BTC transactions on your phone, anyway.

Therefore, if all you have is the block headers, you can just download the next 1-2 full blocks after the transaction was sent, and that would be evidence enough that your account actually received the money. If the transaction wasn't valid, it wouldn't have been included in any blocks at all. So a lite client could confirm transactions after only a single block, as long as those transactions are less than 50 BTC. And it would really only have to hold the block headers, and a few full-blocks at a time...?

-Eto

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here! (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)

TierNolan

Legendary

Offline

Activity: 1232
Merit: 1083

Re: Distributed TX Lists and TX flooding defense.

July 06, 2011, 08:33:57 AM

#9

Quote from: etotheipi on July 06, 2011, 03:39:37 AM

I want to create an Android app that allows you to confirm receipt of your BTC, but doesn't have to hold the entire block chain. My understanding is that if it contains the block headers, it can trust any single block + transaction-list provided by an arbitrary node, because the merkle root would have to match the block header root, the block header contains the hash of the previous block, and the block header's hash is a very compute-intensive value. It is extremely difficult to create a fake one.

Right, if it is your own coin, then you don't have to worry about double spending.

All you would need to store on your phone would be

<block hash of some block>:<balance as of the block>

You could then ask for all blocks since then.

Once that is downloaded, you could just save

<hash of block 10 blocks back from the head>:<balance as of that block>

This would mean very little to store on the phone and also mean you just need to download

Quote

In all likelihood, you won't be dealing with 50+ BTC transactions on your phone, anyway.

The recommendation is that you don't count as transaction as confirmed until the block with the transaction is at least 6 blocks back in the chain.

There is a time/reliability tradeoff. For small trades, it might be worth just absorbing double spends.

Quote

Therefore, if all you have is the block headers, you can just download the next 1-2 full blocks after the transaction was sent, and that would be evidence enough that your account actually received the money.

Right.

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF

etotheipi (OP)

Legendary

Offline

Activity: 1428
Merit: 1093

Core Armory Developer

Re: Distributed TX Lists and TX flooding defense.

July 06, 2011, 01:38:09 PM

#10

If someone broadcasts an invalid transaction to the network (perhaps moving coins from one account with not enough funds), how far will that transaction get? Is it shot down immediately by the first node that sees it? Is it propagated and each node can decide whether it is valid? I'm curious about when someone sends me coins and I see the transaction immediately, but then wait for confirmation. Would I even see the transaction if other nodes didn't think it was valid? Or is it simply up to the miners to check validity and include it in their blocks if it is?

In other words, how much can I trust a 0-confirmation transaction? I know that's very "weak". But a lot of "weaknesses" are still near impossible to exploit, and for a few hundredths of a BTC, it's probably fine to assume no one is going to the effort to do that.

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here! (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)

TierNolan

Legendary

Offline

Activity: 1232
Merit: 1083

Re: Distributed TX Lists and TX flooding defense.

July 06, 2011, 01:42:59 PM

#11

Quote from: etotheipi on July 06, 2011, 01:38:09 PM

If someone broadcasts an invalid transaction to the network (perhaps moving coins from one account with not enough funds), how far will that transaction get?

I think the first node will store the transaction, in case it turns out to be valid, but will not broadcast it.

However, that assumes friendly nodes.

Quote

I'm curious about when someone sends me coins and I see the transaction immediately, but then wait for confirmation. Would I even see the transaction if other nodes didn't think it was valid? Or is it simply up to the miners to check validity and include it in their blocks if it is?

If the buyer was to buy 2 things at exactly the same time, then both transactions could partially propagate and each take part of the network.

Quote

In other words, how much can I trust a 0-confirmation transaction? I know that's very "weak". But a lot of "weaknesses" are still near impossible to exploit, and for a few hundredths of a BTC, it's probably fine to assume no one is going to the effort to do that.

That is a judgment call, but for small transactions, you can probably just absorb the small random losses.

1LxbG5cKXzTwZg9mjL3gaRE835uNQEteWF