Messing with queries

Hepatizon (OP)

Newbie

Offline

Activity: 16
Merit: 0

Messing with queries

July 19, 2010, 05:56:44 AM

#1

The Bitcoin network accepts blocks as valid if enough of the nodes accept the block as valid. The inherent security in this is that an attacker trying to manipulate which blocks are accepted as valid or not would need computing power approaching 50% of the entire network. But it is infeasible to actually query all or even most of the entire network (or at least it will be once the Bitcoin network becomes larger) so in reality each node must take a sample of the network. If the sample is large enough and randomly distributed over the entire network, there is no problem. But if a given node preferentially connects to a smaller subset of nodes, then that node is vulnerable - an attacker need only compromise that smaller subset. Obviously it is in the interests of each node operator to ensure that his node is truly unbiased in determining which node to get block data from.

So, how does the Bitcoin client determine which nodes to connect to? I haven't gone through the source thoroughly enough to figure it out yet, but as far as I can tell it either asks the IRC channel who is online and queries those nodes, or sends a request to IRC directly and then gets a response from an arbitrary node? I think?

I have two main concerns:
1. Is there any 'first-responder' bias in determining which node to ask for block data from? If there is, then a malicious user with access to a very low-latency connection could disrupt the network by responding extremely quickly to any request with some form of false data - perhaps simply by saying that no new block has been hashed yet - and get disproportionate influence over the consensus this way. If HighSpeedCheater can supply information so fast that he responds first 99% of the time to your query, then you need to do 68-69 queries just to get 50/50 odds that you reach a single node that isn't a HighSpeedCheater sockpuppet - and that node itself might be fooled by HighSpeedCheater and unwittingly be repeating false information. Probably unrealistic for an individual to pull this off, but I can see a big, widely distributed system like a botnet or Google being able to do it. Even then, nodes could get around it just by checking everything 140x as much as they used to.

2. Can a 'broadcaster' node influence which nodes it responds to? For example, could Alice set up her system so that every time Bob asks for the most recent block chain, she is the one (or disproportionately more likely to be the one) to answer him? If so, then Bob is basically at Alice's mercy unless he can figure out which addresses Alice is using faster than she can create new sockpuppets.

There's also the possibility that the entire transaction chain could fork if the latency between any two approximately equal computing groups becomes larger than the average block-creation time. Suppose that there is roughly the same about of computing power dedicated to hashing Bitcoin blocks in two groups, say the US and Russia. If - and I can only see this happening if the time between new blocks is very short - it takes more time for a US node to ask a Russian node what the longest block chain is (and vice versa) than it takes for the Russian (or US) Bitcoin network to generate a new block, then chains could develop local to Russia and the US. Suppose a node in the US and Russia successfully hash the next block at about the same time, call them blocks 1001us and 1001ru. The network resolves the discrepancy by seeing whether 1002us or 1002ru gets hashed first. But since there's a delay in getting information from one country to the other, every node in the US that checks a Russian node for their longest chain length gets out-of-date information, so it will see an older, shorter chain than the chain it gets from local, lower latency nodes. And the same thing happens to the Russians, with the end effect that American nodes keep working on the branch containing 1001us because they can only see an older, shorter version of the 10001ru branch, while the Russians work on theirs for the same reason. As long as the chain grows faster than the speed of information, neither chain will ever dominate the other.

bdonlan

Full Member

Offline

Activity: 221
Merit: 102

Re: Messing with queries

July 19, 2010, 06:53:04 AM

#2

Quote from: Hepatizon on July 19, 2010, 05:56:44 AM

The Bitcoin network accepts blocks as valid if enough of the nodes accept the block as valid. The inherent security in this is that an attacker trying to manipulate which blocks are accepted as valid or not would need computing power approaching 50% of the entire network. But it is infeasible to actually query all or even most of the entire network (or at least it will be once the Bitcoin network becomes larger) so in reality each node must take a sample of the network. If the sample is large enough and randomly distributed over the entire network, there is no problem. But if a given node preferentially connects to a smaller subset of nodes, then that node is vulnerable - an attacker need only compromise that smaller subset. Obviously it is in the interests of each node operator to ensure that his node is truly unbiased in determining which node to get block data from.

So, how does the Bitcoin client determine which nodes to connect to? I haven't gone through the source thoroughly enough to figure it out yet, but as far as I can tell it either asks the IRC channel who is online and queries those nodes, or sends a request to IRC directly and then gets a response from an arbitrary node? I think?

I have two main concerns:
1. Is there any 'first-responder' bias in determining which node to ask for block data from? If there is, then a malicious user with access to a very low-latency connection could disrupt the network by responding extremely quickly to any request with some form of false data - perhaps simply by saying that no new block has been hashed yet - and get disproportionate influence over the consensus this way. If HighSpeedCheater can supply information so fast that he responds first 99% of the time to your query, then you need to do 68-69 queries just to get 50/50 odds that you reach a single node that isn't a HighSpeedCheater sockpuppet - and that node itself might be fooled by HighSpeedCheater and unwittingly be repeating false information. Probably unrealistic for an individual to pull this off, but I can see a big, widely distributed system like a botnet or Google being able to do it. Even then, nodes could get around it just by checking everything 140x as much as they used to.

Nodes send an inv message to all peers when they get a new block; if a node gets an inv message for an unknown block, it will always request it (initially from the node that announced it, but I believe it moves on to others if that one fails to respond). So all it takes is one connection to a 'good' node and you'll get the real block chain. Additionally, when a connection is established, each node queries the other for its 'best' block, and asks for it if it's unknown.

Quote from: Hepatizon on July 19, 2010, 05:56:44 AM

2. Can a 'broadcaster' node influence which nodes it responds to? For example, could Alice set up her system so that every time Bob asks for the most recent block chain, she is the one (or disproportionately more likely to be the one) to answer him? If so, then Bob is basically at Alice's mercy unless he can figure out which addresses Alice is using faster than she can create new sockpuppets.

Again, if there's a single good connection, you should be able to get the real block chain, eventually.

Quote from: Hepatizon on July 19, 2010, 05:56:44 AM

There's also the possibility that the entire transaction chain could fork if the latency between any two approximately equal computing groups becomes larger than the average block-creation time. Suppose that there is roughly the same about of computing power dedicated to hashing Bitcoin blocks in two groups, say the US and Russia. If - and I can only see this happening if the time between new blocks is very short - it takes more time for a US node to ask a Russian node what the longest block chain is (and vice versa) than it takes for the Russian (or US) Bitcoin network to generate a new block, then chains could develop local to Russia and the US. Suppose a node in the US and Russia successfully hash the next block at about the same time, call them blocks 1001us and 1001ru. The network resolves the discrepancy by seeing whether 1002us or 1002ru gets hashed first. But since there's a delay in getting information from one country to the other, every node in the US that checks a Russian node for their longest chain length gets out-of-date information, so it will see an older, shorter chain than the chain it gets from local, lower latency nodes. And the same thing happens to the Russians, with the end effect that American nodes keep working on the branch containing 1001us because they can only see an older, shorter version of the 10001ru branch, while the Russians work on theirs for the same reason. As long as the chain grows faster than the speed of information, neither chain will ever dominate the other.

Since the block production difficulty is automatically calibrated so that blocks come approximately 10 minutes apart from each other, it's unlikely that latency will ever get that high. In any case, though, all it takes is one or two blocks found in quick succession on one chain to throw things out of balance enough for the network to converge on one chain. This also won't affect normal transactions very much - they'll still go through, and generated coins are unusable until 120 blocks go by (by which point a winner should have emerged). The only risk is that someone might spend the same coin once on each of the two chains, to a different recipient - this could result in some confusion, but if you wait for 10-20 blocks to go by before accepting the transaction as 'confirmed', the risk of a fork should be acceptably low.

Hepatizon (OP)

Newbie

Offline

Activity: 16
Merit: 0

Re: Messing with queries

July 19, 2010, 02:14:14 PM

#3

Quote from: bdonlan on July 19, 2010, 06:53:04 AM

Quote from: Hepatizon on July 19, 2010, 05:56:44 AM

There's also the possibility that the entire transaction chain could fork if the latency between any two approximately equal computing groups becomes larger than the average block-creation time. Suppose that there is roughly the same about of computing power dedicated to hashing Bitcoin blocks in two groups, say the US and Russia. If - and I can only see this happening if the time between new blocks is very short - it takes more time for a US node to ask a Russian node what the longest block chain is (and vice versa) than it takes for the Russian (or US) Bitcoin network to generate a new block, then chains could develop local to Russia and the US. Suppose a node in the US and Russia successfully hash the next block at about the same time, call them blocks 1001us and 1001ru. The network resolves the discrepancy by seeing whether 1002us or 1002ru gets hashed first. But since there's a delay in getting information from one country to the other, every node in the US that checks a Russian node for their longest chain length gets out-of-date information, so it will see an older, shorter chain than the chain it gets from local, lower latency nodes. And the same thing happens to the Russians, with the end effect that American nodes keep working on the branch containing 1001us because they can only see an older, shorter version of the 10001ru branch, while the Russians work on theirs for the same reason. As long as the chain grows faster than the speed of information, neither chain will ever dominate the other.

Since the block production difficulty is automatically calibrated so that blocks come approximately 10 minutes apart from each other, it's unlikely that latency will ever get that high. In any case, though, all it takes is one or two blocks found in quick succession on one chain to throw things out of balance enough for the network to converge on one chain. This also won't affect normal transactions very much - they'll still go through, and generated coins are unusable until 120 blocks go by (by which point a winner should have emerged). The only risk is that someone might spend the same coin once on each of the two chains, to a different recipient - this could result in some confusion, but if you wait for 10-20 blocks to go by before accepting the transaction as 'confirmed', the risk of a fork should be acceptably low.

I don't think this scenario is particularly likely, especially not with the ~10 minutes between block creation. It could only happen if the block creation speed was raised considerably, and even then it would require two very equal computing groups. If it was 52-48 split I think the 52% group would eventually win out - I need to run some simulations. I'd also need to check how stable the two forks would be, and about how long it would take for one to trump the other. So file this under "bizarre doomsday scenario that's nearly impossible and might not even be that bad if it happens." As far as block creation time - I could see it having to be reduced substantially. Each block has a maximum size; if there is a constant time between blocks, then the maximum number of transactions that can be processed per unit time is (max block size) / (avg transaction size) / (block creation time). If BC takes off and we approach that number, then either max block size needs to go up or block creation time needs to go down. We'd have to choose option B a lot of times in a row for this scenario to become possible.

Hepatizon (OP)

Newbie

Offline

Activity: 16
Merit: 0

Re: Messing with queries

July 19, 2010, 02:52:41 PM

#4

Quote from: bdonlan on July 19, 2010, 06:53:04 AM

Quote from: Hepatizon on July 19, 2010, 05:56:44 AM

So, how does the Bitcoin client determine which nodes to connect to? I haven't gone through the source thoroughly enough to figure it out yet, but as far as I can tell it either asks the IRC channel who is online and queries those nodes, or sends a request to IRC directly and then gets a response from an arbitrary node? I think?

I have two main concerns:
1. Is there any 'first-responder' bias in determining which node to ask for block data from? If there is, then a malicious user with access to a very low-latency connection could disrupt the network by responding extremely quickly to any request with some form of false data - perhaps simply by saying that no new block has been hashed yet - and get disproportionate influence over the consensus this way. If HighSpeedCheater can supply information so fast that he responds first 99% of the time to your query, then you need to do 68-69 queries just to get 50/50 odds that you reach a single node that isn't a HighSpeedCheater sockpuppet - and that node itself might be fooled by HighSpeedCheater and unwittingly be repeating false information. Probably unrealistic for an individual to pull this off, but I can see a big, widely distributed system like a botnet or Google being able to do it. Even then, nodes could get around it just by checking everything 140x as much as they used to.

Nodes send an inv message to all peers when they get a new block; if a node gets an inv message for an unknown block, it will always request it (initially from the node that announced it, but I believe it moves on to others if that one fails to respond). So all it takes is one connection to a 'good' node and you'll get the real block chain. Additionally, when a connection is established, each node queries the other for its 'best' block, and asks for it if it's unknown.

Okay. I was confused by this in the function ProcessMessage

Code:

// Ask the first connected node for block updates
        static int nAskedForBlocks;
        if (!pfrom->fClient && (nAskedForBlocks < 1 || vNodes.size() <= 1))
        {
            nAskedForBlocks++;
            pfrom->PushGetBlocks(pindexBest, uint256(0));
        }

I forgot static means that nAskedForBlocks is persistent across function calls. If it wasn't then the client would ask for blocks from every node that sends it a message.