Bogus locator in getheaders (test data wanted)

Coinr8d (OP)

Newbie

Offline

Activity: 13
Merit: 17

Bogus locator in getheaders (test data wanted)

July 17, 2018, 07:03:55 AM
Last edit: July 21, 2018, 09:09:33 AM by Coinr8d

Merited by theymos (6), gmaxwell (2), qwk (1), ABCbits (1)

I'm trying to understand the getheaders protocol message and how bitcoin core operates when it receives one. FindForkInGlobalIndex is being called when the locator is present the message. This message seems to go through all hashes inside of the locator and make a hash table look up to see if we know the hash.

In the seem to me that an attacker can send us this protocol message with bogus hashes and the only limit I can see is the peer to peer network protocol message size limit of 4,000,000 bytes. This translates to roughly 125,000 hashes inside of the locator.

Therefore it seems that it is possible for the attacker to make us perform that many hash table look up operations while holding cs_main lock.

Is this really possible or am I missing something? If it is possible, is it not a denial service vector?

aliashraf

Legendary

Offline

Activity: 1456
Merit: 1174

Always remember the cause!

Re: Bogus locator in getheaders

July 17, 2018, 08:24:41 AM

I suppose getblockheader doesn't accept multiple inputs, you send one hash you get one result either in json or raw hex serialized format. I don't see any support for multiple hashes of multiple blocks to be requested. Am I missing something?

Coinr8d (OP)

Newbie

Offline

Activity: 13
Merit: 17

Re: Bogus locator in getheaders

July 17, 2018, 09:06:24 AM

https://en.bitcoin.it/wiki/Protocol_documentation#getheaders - talking about block locator hashes.

aliashraf

Legendary

Offline

Activity: 1456
Merit: 1174

Always remember the cause!

Re: Bogus locator in getheaders

July 17, 2018, 10:05:52 AM
Last edit: July 17, 2018, 10:41:44 AM by aliashraf

I was discussing getblockheader instead of your inquiry about getheaders. My mistake.

Now your problem is about a spv node who might send a huge block locator with bogus hashes, causing the client software to go through nonsense exhaustive searches.

Block locator is a special stack supposed to have around 10*log₂maxblockheigt items with genesis block's hash on top. The structure is designed to help the situation with (specially) temporary short range forks. Abusing this structure for attacking the node by keeping it busy locating hashes is not possible, because the client software doesn't check all the hashes exhaustively, it tries to locate the fork point (approximately) by performing a (special version) of binary search on the list.

A more safe implementation should check the number of locators (block locator size) as well to be acceptable (not too large), I admit.

Coinr8d (OP)

Newbie

Offline

Activity: 13
Merit: 17

Re: Bogus locator in getheaders

July 17, 2018, 11:15:45 AM

Merited by suchmoon (7)

Thanks for reply. I admit that my C++ is not very good, so I could be missing something or misreading something. However, I can't see what you say in the code.

In primitives\block.h we have

Code:

struct CBlockLocator
{
    std::vector<uint256> vHave;

    CBlockLocator() {}

    explicit CBlockLocator(const std::vector<uint256>& vHaveIn) : vHave(vHaveIn) {}

    ADD_SERIALIZE_METHODS;

    template <typename Stream, typename Operation>
    inline void SerializationOp(Stream& s, Operation ser_action) {
        int nVersion = s.GetVersion();
        if (!(s.GetType() & SER_GETHASH))
            READWRITE(nVersion);
        READWRITE(vHave);
    }

So it seems there is no filtering on serialization - i.e. it looks like "what comes from network is going directly to the object instance". So if I read it correctly - if I send 100k hashes in locator, this object will be initialized with all of them.

Then in validation.cpp we have

Code:

CBlockIndex* FindForkInGlobalIndex(const CChain& chain, const CBlockLocator& locator)
{
    AssertLockHeld(cs_main);

    // Find the latest block common to locator and chain - we expect that
    // locator.vHave is sorted descending by height.
    for (const uint256& hash : locator.vHave) {
        CBlockIndex* pindex = LookupBlockIndex(hash);
        if (pindex) {
            if (chain.Contains(pindex))
                return pindex;
            if (pindex->GetAncestor(chain.Height()) == chain.Tip()) {
                return chain.Tip();
            }
        }
    }
    return chain.Genesis();
}

so that looks to me like we go through all of them and we do lookups and we only have further processing when it is found.

What I don't see then is where this is implemented:

Quote

because the client software doesn't check all the hashes exhaustively

Coinr8d (OP)

Newbie

Offline

Activity: 13
Merit: 17

Re: Bogus locator in getheaders

July 17, 2018, 01:36:56 PM

Quote from: Evil-Knievel on July 17, 2018, 12:52:47 PM

Quote from: Coinr8d on July 17, 2018, 07:03:55 AM

Not possible,
you can only request 10 unconnecting headers announcements before a DOS limiter kicks in.
The variable (static const int MAX_UNCONNECTING_HEADERS = 10) limiting this is hard coded in validation.h.

Just do a "grep MAX_UNCONNECTING_HEADERS * -r" to see where the it's actually getting limited ... hint: in net_processing.cpp Wink

Sorry, I fail to see that. The constant is used in ProcessHeadersMessage, which seems to have nothing to do with processing of getheaders protocol message (which starts on line 2029).

Evil-Knievel

Legendary

Offline

Activity: 1260
Merit: 1168

Re: Bogus locator in getheaders

July 17, 2018, 01:45:32 PM

Small correction.

The limit of what a node replies to a GETHEADERS message can be found here - in net_processing.cpp:

Code:

 // we must use CBlocks, as CBlockHeaders won't include the 0x00 nTx count at the end
        std::vector<CBlock> vHeaders;
        int nLimit = MAX_HEADERS_RESULTS;
        LogPrint(BCLog::NET, "getheaders %d to %s from peer=%d\n", (pindex ? pindex->nHeight : -1), hashStop.IsNull() ? "end" : hashStop.ToString(), pfrom->GetId());
        for (; pindex; pindex = chainActive.Next(pindex))
        {
            vHeaders.push_back(pindex->GetBlockHeader());
            if (--nLimit <= 0 || pindex->GetBlockHash() == hashStop)
                break;
        }

You can request as much as you want, you will not get back more than MAX_HEADERS_RESULTS nor will the node process more than that.

Coinr8d (OP)

Newbie

Offline

Activity: 13
Merit: 17

Re: Bogus locator in getheaders

July 17, 2018, 01:49:00 PM

Quote from: Evil-Knievel on July 17, 2018, 01:45:32 PM

Small correction.

The limit of what a node replies to a GETHEADERS message can be found here - in net_processing.cpp:

Code:

 // we must use CBlocks, as CBlockHeaders won't include the 0x00 nTx count at the end
        std::vector<CBlock> vHeaders;
        int nLimit = MAX_HEADERS_RESULTS;
        LogPrint(BCLog::NET, "getheaders %d to %s from peer=%d\n", (pindex ? pindex->nHeight : -1), hashStop.IsNull() ? "end" : hashStop.ToString(), pfrom->GetId());
        for (; pindex; pindex = chainActive.Next(pindex))
        {
            vHeaders.push_back(pindex->GetBlockHeader());
            if (--nLimit <= 0 || pindex->GetBlockHash() == hashStop)
                break;
        }

You can request as much as you want, you will not get back more than MAX_HEADERS_RESULTS nor will the node process more than that.

But this code is executed AFTER FindForkInGlobalIndex is called. I'm talking about FindForkInGlobalIndex itself.

Evil-Knievel

Legendary

Offline

Activity: 1260
Merit: 1168

Re: Bogus locator in getheaders

July 17, 2018, 02:02:54 PM

Well, after taking a second look into this, I guess you are completely right. Sorry for my confusing answer earlier.
I see it just as you do: you could probably hang a node for quite a while with this.

Maybe you should reach out to gmaxwell about this?

Coinr8d (OP)

Newbie

Offline

Activity: 13
Merit: 17

Re: Bogus locator in getheaders

July 17, 2018, 02:13:48 PM

#10

Thanks for the review, I think I will wait for more people to check it out and confirm if this is valid before pushing it further. I'm also unaware of the absolute cost of a single hash table look up in C++. Still gut feeling is that hundred thousand of those can take some time and it seems completely unnecessary to allow locators to have more than a hundred of items.

Coinr8d (OP)

Newbie

Offline

Activity: 13
Merit: 17

Re: Bogus locator in getheaders

July 17, 2018, 02:31:51 PM

#11

Maybe also notably, Altcoins that forked out of bitcoin long time ago, may still have maximum size of the network message set to 32 MB, which makes the problem for them eight times worse.

Coinr8d (OP)

Newbie

Offline

Activity: 13
Merit: 17

⇾ Re: Bogus locator in getheaders

July 18, 2018, 01:48:11 PM

#12

Bumping with hope that more people will check and either confirm or disprove.

gmaxwell

Moderator
Legendary

Offline

Activity: 4186
Merit: 8435

Re: Bogus locator in getheaders (rewievers wanted)

July 18, 2018, 11:36:47 PM

Merited by theymos_away (10), suchmoon (7), Foxpup (4), qwk (1), ABCbits (1)

#13

Looking up an entry is O(1) -- just a trivial hashing operation and one or two pointer chases.

So basically what you're saying is that you can make the node do a memory access per 32 bytes sent, but virtually any network message also does that. E.g. getblock <random number>.

Locators have no reason to be larger than O(log(blocks)), so indeed it's silly that you can send a big one... but I wouldn't expect it to be interesting. Alternatively, you could consider what is the difference between sending 100k at once vs 20 (a totally reasonable number) many times? Only a trivial amount of network overhead and perhaps a couple milliseconds of blocking other peers whos message handling would otherwise be interleaved. If you benchmark it and find out that its significantly slower per byte sent then other arbitrary messages I'd be interested to hear... but without a benchmark I don't find this interesting enough to check myself.

There are a billion and one things that are conjecturally slow, but most of them are not actually slow (and more than a few things that seem fast are actually slow). Anyone can speculate, testing is what is actually valuable.

Coinr8d (OP)

Newbie

Offline

Activity: 13
Merit: 17

Re: Bogus locator in getheaders (rewievers wanted)

July 19, 2018, 08:51:42 AM

#14

Quote from: gmaxwell on July 18, 2018, 11:36:47 PM

Thank you for your reply, it sounds reasonable that you want to see the actual numbers, so eventually I will try to provide them.

The reason why I am not looking at it is from the perspective of sending a large number of small messages is that I understand denial of service attacks as attacks which require spending significantly less resources on one side and significantly more on the other side, which I believe would not be the case in case of many small messages, because the attacker would need to send the message and the approximately the same amount of work would be done on the node's side plus a single operation such as lookup. So if the lookup itself is not significantly more expensive than sending the message (which I guess is actually the opposite), there is no great disproportion of the work of the attacker and the node.

But in case of big block locator, the amount of work on the attacker side to prepare and send seems to be significantly lower than what needs to be done on the node side. And also the preparation of the block locator is amortised if the attacker just reuses the prepared block locator for sending multiple messages of same content.

I expect the attacker to be able to construct a block locator smartly so that it takes much more than the average time to lookup each of the hashes in it. I will need to look into the actual implementation of the hash table in C++ to be able to find the bucket with most collisions and then probably finding couple of those big buckets to mess up little bit with CPU caches. This should give me the worst possible lookup times for each item.

Coinr8d (OP)

Newbie

Offline

Activity: 13
Merit: 17

Re: Bogus locator in getheaders (rewievers wanted)

July 19, 2018, 11:38:22 AM

#15

One more thing why this is not equivalent of sending many small messages – when you send a lot of messages, you acquire the lock for very short time and then you release it so you are giving other threads a chance to do their work, whereas when I force you to grab the lock for a longer time, everyone else waiting for this lock is just waiting.

aliashraf

Legendary

Offline

Activity: 1456
Merit: 1174

Always remember the cause!

Re: Bogus locator in getheaders (rewievers wanted)

July 19, 2018, 07:03:09 PM

#16

I just came back to this topic, and found myself wrong about the issue OP has brought to us, I was sure the special stacking technique used to generate block locator, insures efficiency but OP is questioning DoS vulnerability by arbitrarily stacked void values.

And OP is right! Code directly serializes getheaders message into block locator and exhaustively tries to locate all of the addresses. The most naive approach ever. I'm just surprised Shocked

Then, I see another weird thing, @GMaxwell shows little interest to Op's review and asks for kinda Proof of Labour or something from OP's side. Shocked

Quote from: gmaxwell on July 18, 2018, 11:36:47 PM

Being "interesting" or not, this IS a DoS vulnerability and could be easily mitigated by enforcing all or a combination of the following controls on the light clients' getheaders request:
1- Check the length of the supplied block locator not to be greater than 10*Log₂(max_height) + a_safe_not-too_large_threshold
2- Check the difficulty of the supplied hashes to be higher than or equal to some_heuristic_nontrivial_safe_value
3- Check first/every 10 hashes to be reasonably close in terms of difficulty(they are supposed to be).
4- Black list spv clients who send malicious getheaders requests in a row.

As of other network messages (like getblock): They are not good candidates for such an attack because of very short period of time that lock is hold.

Quote

There are a billion and one things that are conjecturally slow, but most of them are not actually slow (and more than a few things that seem fast are actually slow).

I don't agree. Any algorithm/code can correctly be analysed and optimized/secured accordingly. No magics.

Quote

... Anyone can speculate, testing is what is actually valuable.

Code review is valuable too and has inherent premium compared to tests and benchmarks.

gmaxwell

Moderator
Legendary

Offline

Activity: 4186
Merit: 8435

Re: Bogus locator in getheaders (rewievers wanted)

July 20, 2018, 10:32:59 PM

Merited by spin (10), Foxpup (8)

#17

Quote

Being "interesting" or not, this IS a DoS vulnerability

No information has been presented so far which supports this claim. It was a reasonable question to ask if looking up an entry was expensive, if it were then it would be an issue. But it is not expensive it is exceptionally cheap.

In fact, sticking a loop that takes cs_main and does 200k lookups each time a network message is received seems to only increase CPU usage of my bitcoin node from 3% to 10%. Maybe just maybe there is some crazy pathological request pattern that makes it meaningfully worse and which somehow doesn't also impact virtually every other message. Maybe. It's always possible. But that is just conjecture. Most conjectures turn out to be untrue, talk is cheap. Needlessly insulting sanctimonious talk, doubly so. Some people wonder why few technically competent frequent these forums anymore, but it isn't hard to see why-- especially when the abuse seems to come so often from parties whos posting history reveals that they're primarily interested in pumping some altcoin or another.

Code:

diff --git a/src/net_processing.cpp b/src/net_processing.cpp
index 2f3a60406..6aff91a48 100644
--- a/src/net_processing.cpp
+++ b/src/net_processing.cpp
@@ -1558,6 +1558,15 @@ bool static ProcessMessage(CNode* pfrom, const std::string& strCommand, CDataStr
         }
     }
 
+    {
+        LOCK(cs_main);
+        arith_uint256 hash = 0;
+        for(int i=0;i<200000;i++){
+          BlockMap::iterator mi = mapBlockIndex.find(ArithToUint256(hash));
+          hash++;
+        }
+    }
+
     if (strCommand == NetMsgType::REJECT)
     {
         if (LogAcceptCategory(BCLog::NET)) {

Quote

1- Check the length of the supplied block locator not to be greater than 10*Log₂(max_height) + a_safe_not-too_large_threshold

And then nodes that have very few blocks will get stuck. Congratulations your "fix" for a almost-certain-non-issue broke peers. Safely size limiting it without the risk of disruption probably requires changing the protocol so the requesting side knows to not make too large a request.

Quote

2- Check the difficulty of the supplied hashes to be higher than or equal to some_heuristic_nontrivial_safe_value
3- Check first/every 10 hashes to be reasonably close in terms of difficulty(they are supposed to be).

Why do you expect your hacker to be honest enough to use actual hashes? He can just use arbitrary low numbers.

Quote

4- Black list spv clients who send malicious getheaders requests in a row.

If you had any idea which peers were sending "malicious messages" why would you not just block them completely? ... Any kind of "block a peer when it does X which it could reasonably think was a fine thing to do" risk creating a network wide partitioning attack by potentially creating ways for attackers to trick nodes into getting themselves banned.

Quote

of very short period of time that lock is hold.

You might not be aware but reading a single block from disk and decoding into memory should take longer than a hundred thousand memory accesses takes.

Quote

I don't agree. Any algorithm/code can correctly be analysed and optimized/secured accordingly. No magics.

Yes, and it was analyzed here, and the analysis says that it would be surprising if it were actually slow, so it isn't worth any further discussion unless someone finds a reason to think otherwise, such as a test result.

aliashraf

Legendary

Offline

Activity: 1456
Merit: 1174

Always remember the cause!

Re: Bogus locator in getheaders (rewievers wanted)

August 01, 2018, 02:55:14 PM
Last edit: August 01, 2018, 03:26:58 PM by aliashraf

#18

Quote from: gmaxwell on July 20, 2018, 10:32:59 PM

Quote

Being "interesting" or not, this IS a DoS vulnerability

No need to conspiracy theories. Someone with good faith, reviews the code, sees some abnormal pattern and reports it, people come, discuss and the outcome is always good.

As of your 200K loop, and 20% increase in cpu usage: It is huge, imo. With just a few malicious requests this node will be congested.

Quote

1- Check the length of the supplied block locator not to be greater than 10*Log₂(max_height) + a_safe_not-too_large_threshold

You seem to be wrong, or we are discussing different subjects:

A freshly booting node with 0 blocks in hand will send block locator with just one hash as its payload: genesis block hash. A node almost synchronized will send (at current height) 190-200 hashes. So a getheaders request with more than 200-250 hashes as its payload is obviously malicious for the current height.
Who is missing something here?

Quote

Why do you expect your hacker to be honest enough to use actual hashes? He can just use arbitrary low numbers.

I think it will be helpful to force the hacker to work more on its request rather than randomly supply a nonsense stream of bits.

Quote

4- Black list spv clients who send malicious getheaders requests in a row.

I suppose once a spv client has been fed by like 2000 block headers it should continue with a valid block locator to fetch the next 2000 blocks and so on. If a SPV is continuously sending meaningless requests, it can safely be ignored at least for a while.

Quote

of very short period of time that lock is hold.

You might not be aware but reading a single block from disk and decoding into memory should take longer than a hundred thousand memory accesses takes.

Of course I'm aware of that Grin

Yet, I believe lock is not hold when a block is to be retrieved as a result of getblock, once it has been located, Right?

Quote

I don't agree. Any algorithm/code can correctly be analysed and optimized/secured accordingly. No magics.

Yes, and it was analyzed here.

No, Not yet.

gmaxwell

Moderator
Legendary

Offline

Activity: 4186
Merit: 8435

Re: Bogus locator in getheaders (rewievers wanted)

August 01, 2018, 06:39:11 PM
Last edit: August 01, 2018, 08:31:49 PM by gmaxwell

Merited by Foxpup (6)

#19

Quote from: aliashraf on August 01, 2018, 02:55:14 PM

As of your 200K loop, and 20% increase in cpu usage: It is huge, imo. With just a few malicious requests this node will be congested.

The patch I posted turns _every_ message into a "malicious message" and it only had that modest cpu impact, and didn't keep the node from working. This doesn't prove that there is no way to use more resources, of course, but it indicates that it isn't a big issue as was thought above. Without evidence otherwise this still looks like it's jut not that interesting compared to the hundreds of other ways to make nodes waste resources.

Quote

I think it will be helpful to force the hacker to work more on its request rather than randomly supply a nonsense stream of bits.

It does not require "work" to begin requested numbers with 64 bits of zeros.

Quote

Yet, I believe lock is not hold when a block is to be retrieved as a result of getblock, once it has been located, Right?

Sure it is, otherwise it could be pruned out from under the request.

Quote

So a getheaders request with more than 200-250 hashes as its payload is obviously malicious for the current height.
Who is missing something here?

If you come up and connect to malicious nodes, you can get fed a bogus low difficulty chain with a lot more height than the honest chain, and as a result produce larger locators without you being malicious at all. If peers ban you for that, you'll never converge back from the dummy chain. Similarly, if you are offline a long time, and come back you'll expect a given number of items in the locator, but your peers-- far ahead on the real chain, will have more than you expected. In both cases the simple "fix" creates a vulnerability, not the gravest of vulnerabilities, but the issue being fixed doesn't appear-- given testing so far-- especially interesting so the result would be making things worse,

Quote

I suppose once a spv client has been fed by like 2000 block headers it should continue

This function doesn't have anything to do with SPV clients, in particular. It's how ordinary nodes reconcile their chains with each other. If locators were indeed a SPV only thing, then I agree that it would be easier to just stick arbitrary limits on them without worrying too much about creating other attacks.

aliashraf

Legendary

Offline

Activity: 1456
Merit: 1174

Always remember the cause!

Re: Bogus locator in getheaders (rewievers wanted)

August 01, 2018, 07:00:24 PM
Last edit: August 02, 2018, 08:26:30 PM by aliashraf

#20

Quote from: gmaxwell on August 01, 2018, 06:39:11 PM

Quote from: aliashraf on August 01, 2018, 02:55:14 PM

As of your 200K loop, and 20% increase in cpu usage: It is huge, imo. With just a few malicious requests this node will be congested.

The patch I posted turns _every_ message into a "malicious message" and it only had that modest cpu impact, and didn't keep the node from working.

Yet, I'm not convinced that whether it receives enough messages or not. I mean you should simulate a DDoS situation to make further conclusions.

Plus, I think when it comes to simply improving the code and not in a trade-off situation always choosing optimised code is best practice. Your doubts are useful when we are about to cut or downgrade a service because of a hypothetical DoS vulnerability.

Quote

So a getheaders request with more than 200-250 hashes as its payload is obviously malicious for the current height.
Who is missing something here?

Ok. No blacklisting, but we can simply reply with genesis block header (plus first 2000 headers) immediately after the receipt of an unreasonably loaded block locator. This can be conventionally used by the client as a sign of it being bootstrapped maliciously. After all it is what we do eventually, why not just take the shortcut instead of trying to locate 200K dummy hashes.
Deal?

As of honest nodes being offline for a long time , I suppose they should get synched before being a useful peer. And we are talking about a logarithmic function, for the network to have 300 legitimate hashes in block locator we should reach to a block height of 1,000,000,000 or so. It is going to happen like 20 centuries later.

Quote

Full nodes use getblock AFAIK to synch, as of getheaders which is our concern, bitcoin wiki suggests:

Quote from: https://en.bitcoin.it/wiki/Protocol_documentation#getheaders

The getheaders command is used by thin clients to quickly download the block chain where the contents of the transactions would be irrelevant

Hence, if a thin client resubmits bogus block locators in a row, it could be blacklisted safely.

Another important measure for both full and thin clients would be checking the maximum block height with their own clock not to be unreasonably divergent.
This way no client would have an excuse like 'I thought we have multiple billions of billions blocks so I filled such a lengthy block locator innocently'.

Pages: [1] 2 » All

Bitcoin Forum > Bitcoin > Development & Technical Discussion > Bogus locator in getheaders (test data wanted)

« previous topic next topic »