Bitcoin Forum
May 14, 2024, 06:03:51 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: 1 2 [All]
  Print  
Author Topic: Bogus locator in getheaders (test data wanted)  (Read 614 times)
Coinr8d (OP)
Newbie
*
Offline Offline

Activity: 13
Merit: 17


View Profile
July 17, 2018, 07:03:55 AM
Last edit: July 21, 2018, 09:09:33 AM by Coinr8d
Merited by theymos (6), gmaxwell (2), qwk (1), ABCbits (1)
 #1

I'm trying to understand the getheaders protocol message and how bitcoin core operates when it receives one. FindForkInGlobalIndex is being called when the locator is present the message. This message seems to go through all hashes inside of the locator and make a hash table look up to see if we know the hash.

In the seem to me that an attacker can send us this protocol message with bogus hashes and the only limit I can see is the peer to peer network protocol message size limit of 4,000,000 bytes. This translates to roughly 125,000 hashes inside of the locator.

Therefore it seems that it is possible for the attacker to make us perform that many hash table look up operations while holding cs_main lock.

Is this really possible or am I missing something? If it is possible, is it not a denial service vector?
Bitcoin mining is now a specialized and very risky industry, just like gold mining. Amateur miners are unlikely to make much money, and may even lose money. Bitcoin is much more than just mining, though!
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1715666631
Hero Member
*
Offline Offline

Posts: 1715666631

View Profile Personal Message (Offline)

Ignore
1715666631
Reply with quote  #2

1715666631
Report to moderator
1715666631
Hero Member
*
Offline Offline

Posts: 1715666631

View Profile Personal Message (Offline)

Ignore
1715666631
Reply with quote  #2

1715666631
Report to moderator
aliashraf
Legendary
*
Offline Offline

Activity: 1456
Merit: 1174

Always remember the cause!


View Profile WWW
July 17, 2018, 08:24:41 AM
 #2

I suppose getblockheader doesn't accept multiple inputs, you send one hash you get one result either in json or raw hex serialized format. I don't see any support for multiple hashes of multiple blocks to be requested. Am I missing something?
Coinr8d (OP)
Newbie
*
Offline Offline

Activity: 13
Merit: 17


View Profile
July 17, 2018, 09:06:24 AM
 #3

https://en.bitcoin.it/wiki/Protocol_documentation#getheaders - talking about block locator hashes.
aliashraf
Legendary
*
Offline Offline

Activity: 1456
Merit: 1174

Always remember the cause!


View Profile WWW
July 17, 2018, 10:05:52 AM
Last edit: July 17, 2018, 10:41:44 AM by aliashraf
 #4

I was discussing getblockheader instead of your inquiry about getheaders. My mistake.

Now your problem is about a spv node who might send a huge block locator with bogus hashes, causing the client software to go through nonsense exhaustive searches.

Block locator is a special stack supposed to have around 10*log2maxblockheigt items with genesis block's hash on top. The structure is designed to help the situation with (specially) temporary short range forks. Abusing this structure for attacking the node by keeping it busy locating hashes is not possible, because the client software doesn't check all the hashes exhaustively, it tries to locate the fork point (approximately) by performing a (special version) of binary search on the list.

A more safe implementation should check the number of locators (block locator size) as well to be acceptable (not too large), I admit.
Coinr8d (OP)
Newbie
*
Offline Offline

Activity: 13
Merit: 17


View Profile
July 17, 2018, 11:15:45 AM
Merited by suchmoon (7)
 #5

Thanks for reply. I admit that my C++ is not very good, so I could be missing something or misreading something. However, I can't see what you say in the code.

In primitives\block.h we have

Code:
struct CBlockLocator
{
    std::vector<uint256> vHave;

    CBlockLocator() {}

    explicit CBlockLocator(const std::vector<uint256>& vHaveIn) : vHave(vHaveIn) {}

    ADD_SERIALIZE_METHODS;

    template <typename Stream, typename Operation>
    inline void SerializationOp(Stream& s, Operation ser_action) {
        int nVersion = s.GetVersion();
        if (!(s.GetType() & SER_GETHASH))
            READWRITE(nVersion);
        READWRITE(vHave);
    }

So it seems there is no filtering on serialization - i.e. it looks like "what comes from network is going directly to the object instance". So if I read it correctly - if I send 100k hashes in locator, this object will be initialized with all of them.

Then in validation.cpp we have

Code:
CBlockIndex* FindForkInGlobalIndex(const CChain& chain, const CBlockLocator& locator)
{
    AssertLockHeld(cs_main);

    // Find the latest block common to locator and chain - we expect that
    // locator.vHave is sorted descending by height.
    for (const uint256& hash : locator.vHave) {
        CBlockIndex* pindex = LookupBlockIndex(hash);
        if (pindex) {
            if (chain.Contains(pindex))
                return pindex;
            if (pindex->GetAncestor(chain.Height()) == chain.Tip()) {
                return chain.Tip();
            }
        }
    }
    return chain.Genesis();
}

so that looks to me like we go through all of them and we do lookups and we only have further processing when it is found.

What I don't see then is where this is implemented:

Quote
because the client software doesn't check all the hashes exhaustively



Coinr8d (OP)
Newbie
*
Offline Offline

Activity: 13
Merit: 17


View Profile
July 17, 2018, 01:36:56 PM
 #6

I'm trying to understand the getheaders protocol message and how bitcoin core operates when it receives one. FindForkInGlobalIndex is being called when the locator is present the message. This message seems to go through all hashes inside of the locator and make a hash table look up to see if we know the hash.

In the seem to me that an attacker can send us this protocol message with bogus hashes and the only limit I can see is the peer to peer network protocol message size limit of 4,000,000 bytes. This translates to roughly 125,000 hashes inside of the locator.

Therefore it seems that it is possible for the attacker to make us perform that many hash table look up operations while holding cs_main lock.

Is this really possible or am I missing something? If it is possible, is it not a denial service vector?

Not possible,
you can only request 10 unconnecting headers announcements before a DOS limiter kicks in.
The variable (static const int MAX_UNCONNECTING_HEADERS = 10) limiting this is hard coded in validation.h.

Just do a "grep MAX_UNCONNECTING_HEADERS * -r" to see where the it's actually getting limited ... hint: in net_processing.cpp  Wink

Sorry, I fail to see that. The constant is used in ProcessHeadersMessage, which seems to have nothing to do with processing of getheaders protocol message (which starts on line 2029).
Evil-Knievel
Legendary
*
Offline Offline

Activity: 1260
Merit: 1168



View Profile
July 17, 2018, 01:45:32 PM
 #7

Small correction.

The limit of what a node replies to a GETHEADERS message can be found here - in net_processing.cpp:

Code:
 // we must use CBlocks, as CBlockHeaders won't include the 0x00 nTx count at the end
        std::vector<CBlock> vHeaders;
        int nLimit = MAX_HEADERS_RESULTS;
        LogPrint(BCLog::NET, "getheaders %d to %s from peer=%d\n", (pindex ? pindex->nHeight : -1), hashStop.IsNull() ? "end" : hashStop.ToString(), pfrom->GetId());
        for (; pindex; pindex = chainActive.Next(pindex))
        {
            vHeaders.push_back(pindex->GetBlockHeader());
            if (--nLimit <= 0 || pindex->GetBlockHash() == hashStop)
                break;
        }

You can request as much as you want, you will not get back more than MAX_HEADERS_RESULTS nor will the node process more than that.
Coinr8d (OP)
Newbie
*
Offline Offline

Activity: 13
Merit: 17


View Profile
July 17, 2018, 01:49:00 PM
 #8

Small correction.

The limit of what a node replies to a GETHEADERS message can be found here - in net_processing.cpp:

Code:
 // we must use CBlocks, as CBlockHeaders won't include the 0x00 nTx count at the end
        std::vector<CBlock> vHeaders;
        int nLimit = MAX_HEADERS_RESULTS;
        LogPrint(BCLog::NET, "getheaders %d to %s from peer=%d\n", (pindex ? pindex->nHeight : -1), hashStop.IsNull() ? "end" : hashStop.ToString(), pfrom->GetId());
        for (; pindex; pindex = chainActive.Next(pindex))
        {
            vHeaders.push_back(pindex->GetBlockHeader());
            if (--nLimit <= 0 || pindex->GetBlockHash() == hashStop)
                break;
        }

You can request as much as you want, you will not get back more than MAX_HEADERS_RESULTS nor will the node process more than that.

But this code is executed AFTER FindForkInGlobalIndex is called. I'm talking about FindForkInGlobalIndex itself.
Evil-Knievel
Legendary
*
Offline Offline

Activity: 1260
Merit: 1168



View Profile
July 17, 2018, 02:02:54 PM
 #9

Well, after taking a second look into this, I guess you are completely right. Sorry for my confusing answer earlier.
I see it just as you do: you could probably hang a node for quite a while with this.

Maybe you should reach out to gmaxwell about this?
Coinr8d (OP)
Newbie
*
Offline Offline

Activity: 13
Merit: 17


View Profile
July 17, 2018, 02:13:48 PM
 #10

Thanks for the review, I think I will wait for more people to check it out and confirm if this is valid before pushing it further. I'm also unaware of the absolute cost of a single hash table look up in C++. Still gut feeling is that hundred thousand of those can take some time and it seems completely unnecessary to allow locators to have more than a hundred of items.
Coinr8d (OP)
Newbie
*
Offline Offline

Activity: 13
Merit: 17


View Profile
July 17, 2018, 02:31:51 PM
 #11

Maybe also notably, Altcoins that forked out of bitcoin long time ago, may still have maximum size of the network message set to 32 MB, which makes the problem for them eight times worse.
Coinr8d (OP)
Newbie
*
Offline Offline

Activity: 13
Merit: 17


View Profile
July 18, 2018, 01:48:11 PM
 #12

Bumping with hope that more people will check and either confirm or disprove.
gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 4172
Merit: 8420



View Profile WWW
July 18, 2018, 11:36:47 PM
Merited by theymos_away (10), suchmoon (7), Foxpup (4), qwk (1), ABCbits (1)
 #13

Looking up an entry is O(1) -- just a trivial hashing operation and one or two pointer chases.

So basically what you're saying is that you can make the node do a memory access per 32 bytes sent,  but virtually any network message also does that.  E.g. getblock <random number>.

Locators have no reason to be larger than O(log(blocks)), so indeed it's silly that you can send a big one... but I wouldn't expect it to be interesting. Alternatively, you could consider what is the difference between sending 100k at once vs 20 (a totally reasonable number) many times? Only a trivial amount of network overhead and perhaps a couple milliseconds of blocking other peers whos message handling would otherwise be interleaved.  If you benchmark it and find out that its significantly slower per byte sent then other arbitrary messages I'd be interested to hear... but without a benchmark I don't find this interesting enough to check myself.  

There are a billion and one things that are conjecturally slow, but most of them are not actually slow (and more than a few things that seem fast are actually slow).  Anyone can speculate, testing is what is actually valuable.
Coinr8d (OP)
Newbie
*
Offline Offline

Activity: 13
Merit: 17


View Profile
July 19, 2018, 08:51:42 AM
 #14

Looking up an entry is O(1) -- just a trivial hashing operation and one or two pointer chases.

So basically what you're saying is that you can make the node do a memory access per 32 bytes sent,  but virtually any network message also does that.  E.g. getblock <random number>.

Locators have no reason to be larger than O(log(blocks)), so indeed it's silly that you can send a big one... but I wouldn't expect it to be interesting. Alternatively, you could consider what is the difference between sending 100k at once vs 20 (a totally reasonable number) many times? Only a trivial amount of network overhead and perhaps a couple milliseconds of blocking other peers whos message handling would otherwise be interleaved.  If you benchmark it and find out that its significantly slower per byte sent then other arbitrary messages I'd be interested to hear... but without a benchmark I don't find this interesting enough to check myself.  

There are a billion and one things that are conjecturally slow, but most of them are not actually slow (and more than a few things that seem fast are actually slow).  Anyone can speculate, testing is what is actually valuable.

Thank you for your reply, it sounds reasonable that you want to see the actual numbers, so eventually I will try to provide them.

The reason why I am not looking at it is from the perspective of sending a large number of small messages is that I understand denial of service attacks as attacks which require spending significantly less resources on one side and significantly more on the other side, which I believe would not be the case in case of many small messages, because the attacker would need to send the message and the approximately the same amount of work would be done on the node's side plus a single operation such as lookup. So if the lookup itself is not significantly more expensive than sending the message (which I guess is actually the opposite), there is no great disproportion of the work of the attacker and the node.

But in case of big block locator, the amount of work on the attacker side to prepare and send seems to be significantly lower than what needs to be done on the node side. And also the preparation of the block locator is amortised if the attacker just reuses the prepared block locator for sending multiple messages of same content.

I expect the attacker to be able to construct a block locator smartly so that it takes much more than the average time to lookup each of the hashes in it. I will need to look into the actual implementation of the hash table in C++ to be able to find the bucket with most collisions and then probably finding couple of those big buckets to mess up little bit with CPU caches. This should give me the worst possible lookup times for each item.
Coinr8d (OP)
Newbie
*
Offline Offline

Activity: 13
Merit: 17


View Profile
July 19, 2018, 11:38:22 AM
 #15

One more thing why this is not equivalent of sending many small messages – when you send a lot of messages, you acquire the lock for very short time and then you release it so you are giving other threads a chance to do their work, whereas when I force you to grab the lock for a longer time, everyone else waiting for this lock is just waiting.
aliashraf
Legendary
*
Offline Offline

Activity: 1456
Merit: 1174

Always remember the cause!


View Profile WWW
July 19, 2018, 07:03:09 PM
 #16

I just came back to this topic, and found myself wrong about the issue OP has brought to us, I was sure the special stacking technique used to generate block locator, insures efficiency but OP is questioning DoS vulnerability by arbitrarily stacked void values.

And OP is right! Code directly serializes getheaders message into block locator and exhaustively tries to locate all of the addresses. The most naive approach ever. I'm just surprised  Shocked

Then, I see another weird thing,  @GMaxwell shows little interest to Op's review and asks for kinda Proof of Labour or something from OP's side.  Shocked

Looking up an entry is O(1) -- just a trivial hashing operation and one or two pointer chases.

So basically what you're saying is that you can make the node do a memory access per 32 bytes sent,  but virtually any network message also does that.  E.g. getblock <random number>.

Locators have no reason to be larger than O(log(blocks)), so indeed it's silly that you can send a big one... but I wouldn't expect it to be interesting. Alternatively, you could consider what is the difference between sending 100k at once vs 20 (a totally reasonable number) many times? Only a trivial amount of network overhead and perhaps a couple milliseconds of blocking other peers whos message handling would otherwise be interleaved.  If you benchmark it and find out that its significantly slower per byte sent then other arbitrary messages I'd be interested to hear... but without a benchmark I don't find this interesting enough to check myself.  

Being "interesting" or not, this IS a DoS vulnerability and could be easily mitigated by enforcing all or a combination of the following controls on the light clients' getheaders request:
1- Check the length of the supplied block locator not to be greater than 10*Log2(max_height) + a_safe_not-too_large_threshold
2- Check the difficulty of the supplied hashes to be higher than or equal to some_heuristic_nontrivial_safe_value
3- Check first/every 10 hashes to be reasonably close in terms of difficulty(they are supposed to be).
4- Black list spv clients who send malicious getheaders requests in a row.

As of other network messages (like getblock): They are not good candidates for such an attack because of very short period of time that lock is hold.
Quote
There are a billion and one things that are conjecturally slow, but most of them are not actually slow (and more than a few things that seem fast are actually slow).  
I don't agree. Any algorithm/code can correctly be analysed and optimized/secured accordingly. No magics.

Quote
... Anyone can speculate, testing is what is actually valuable.

Code review is valuable too and has inherent premium compared to tests and benchmarks.
gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 4172
Merit: 8420



View Profile WWW
July 20, 2018, 10:32:59 PM
Merited by spin (10), Foxpup (8)
 #17

Quote
Being "interesting" or not, this IS a DoS vulnerability

No information has been presented so far which supports this claim.   It was a reasonable question to ask if looking up an entry was expensive, if it were then it would be an issue. But it is not expensive it is exceptionally cheap.

In fact, sticking a loop that takes cs_main and does 200k lookups each time a network message is received seems to only increase CPU usage of my bitcoin node from 3% to 10%.  Maybe just maybe there is some crazy pathological request pattern that makes it meaningfully worse and which somehow doesn't also impact virtually every other message. Maybe. It's always possible. But that is just conjecture.  Most conjectures turn out to be untrue, talk is cheap. Needlessly insulting sanctimonious talk, doubly so.  Some people wonder why few technically competent frequent these forums anymore, but it isn't hard to see why-- especially when the abuse seems to come so often from parties whos posting history reveals that they're primarily interested in pumping some altcoin or another.

Code:
diff --git a/src/net_processing.cpp b/src/net_processing.cpp
index 2f3a60406..6aff91a48 100644
--- a/src/net_processing.cpp
+++ b/src/net_processing.cpp
@@ -1558,6 +1558,15 @@ bool static ProcessMessage(CNode* pfrom, const std::string& strCommand, CDataStr
         }
     }
 
+    {
+        LOCK(cs_main);
+        arith_uint256 hash = 0;
+        for(int i=0;i<200000;i++){
+          BlockMap::iterator mi = mapBlockIndex.find(ArithToUint256(hash));
+          hash++;
+        }
+    }
+
     if (strCommand == NetMsgType::REJECT)
     {
         if (LogAcceptCategory(BCLog::NET)) {

Quote
1- Check the length of the supplied block locator not to be greater than 10*Log2(max_height) + a_safe_not-too_large_threshold
And then nodes that have very few blocks will get stuck. Congratulations your "fix" for a almost-certain-non-issue broke peers. Safely size limiting it without the risk of disruption probably requires changing the protocol so the requesting side knows to not make too large a request.

Quote
2- Check the difficulty of the supplied hashes to be higher than or equal to some_heuristic_nontrivial_safe_value
3- Check first/every 10 hashes to be reasonably close in terms of difficulty(they are supposed to be).
Why do you expect your hacker to be honest enough to use actual hashes? He can just use arbitrary low numbers.

Quote
4- Black list spv clients who send malicious getheaders requests in a row.
If you had any idea which peers were sending "malicious messages" why would you not just block them completely?  ... Any kind of "block a peer when it does X which it could reasonably think was a fine thing to do" risk creating a network wide partitioning attack by potentially creating ways for attackers to trick nodes into getting themselves banned.

Quote
of very short period of time that lock is hold.
You might not be aware but reading a single block from disk and decoding into memory should take longer than a hundred thousand memory accesses takes.

Quote
I don't agree. Any algorithm/code can correctly be analysed and optimized/secured accordingly. No magics.
Yes, and it was analyzed here, and the analysis says that it would be surprising if it were actually slow, so it isn't worth any further discussion unless someone  finds a reason to think otherwise, such as a test result.  
aliashraf
Legendary
*
Offline Offline

Activity: 1456
Merit: 1174

Always remember the cause!


View Profile WWW
August 01, 2018, 02:55:14 PM
Last edit: August 01, 2018, 03:26:58 PM by aliashraf
 #18

Quote
Being "interesting" or not, this IS a DoS vulnerability

No information has been presented so far which supports this claim.   It was a reasonable question to ask if looking up an entry was expensive, if it were then it would be an issue. But it is not expensive it is exceptionally cheap.

In fact, sticking a loop that takes cs_main and does 200k lookups each time a network message is received seems to only increase CPU usage of my bitcoin node from 3% to 10%.  Maybe just maybe there is some crazy pathological request pattern that makes it meaningfully worse and which somehow doesn't also impact virtually every other message. Maybe. It's always possible. But that is just conjecture.  Most conjectures turn out to be untrue, talk is cheap. Needlessly insulting sanctimonious talk, doubly so.  Some people wonder why few technically competent frequent these forums anymore, but it isn't hard to see why-- especially when the abuse seems to come so often from parties whos posting history reveals that they're primarily interested in pumping some altcoin or another.
No need to conspiracy theories. Someone with good faith, reviews the code, sees some abnormal pattern and reports it, people come, discuss and the outcome is always good.

As of your 200K loop, and 20% increase in cpu usage: It is huge, imo. With just a few malicious requests this node will be congested.

Quote
Quote
1- Check the length of the supplied block locator not to be greater than 10*Log2(max_height) + a_safe_not-too_large_threshold
And then nodes that have very few blocks will get stuck. Congratulations your "fix" for a almost-certain-non-issue broke peers. Safely size limiting it without the risk of disruption probably requires changing the protocol so the requesting side knows to not make too large a request.

You seem to be wrong, or we are discussing different subjects:

A freshly booting node with 0 blocks in hand will send block locator with just one hash as its payload: genesis block hash. A node almost synchronized will send (at current height) 190-200 hashes. So a getheaders request with more than 200-250  hashes as its payload is obviously malicious for the current height.
Who is missing something here?
Quote
Quote
2- Check the difficulty of the supplied hashes to be higher than or equal to some_heuristic_nontrivial_safe_value
3- Check first/every 10 hashes to be reasonably close in terms of difficulty(they are supposed to be).
Why do you expect your hacker to be honest enough to use actual hashes? He can just use arbitrary low numbers.
I think it will be helpful to force the hacker to work more on its request rather than randomly supply a nonsense stream of bits.
Quote
Quote
4- Black list spv clients who send malicious getheaders requests in a row.
If you had any idea which peers were sending "malicious messages" why would you not just block them completely?  ... Any kind of "block a peer when it does X which it could reasonably think was a fine thing to do" risk creating a network wide partitioning attack by potentially creating ways for attackers to trick nodes into getting themselves banned.
I suppose once a spv client has been fed by like 2000 block headers it should continue with a valid block locator to fetch the next 2000 blocks and so on. If a SPV is continuously sending meaningless requests, it can safely be ignored at least for a while.
Quote
Quote
of very short period of time that lock is hold.
You might not be aware but reading a single block from disk and decoding into memory should take longer than a hundred thousand memory accesses takes.
Of course I'm aware of that  Grin
Yet, I believe lock is not hold when a block is to be retrieved as a result of getblock, once it has been located, Right?
Quote
Quote
I don't agree. Any algorithm/code can correctly be analysed and optimized/secured accordingly. No magics.
Yes, and it was analyzed here.
No,  Not yet.
gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 4172
Merit: 8420



View Profile WWW
August 01, 2018, 06:39:11 PM
Last edit: August 01, 2018, 08:31:49 PM by gmaxwell
Merited by Foxpup (6)
 #19

As of your 200K loop, and 20% increase in cpu usage: It is huge, imo. With just a few malicious requests this node will be congested.
The patch I posted turns _every_ message into a "malicious message" and it only had that modest cpu impact, and didn't keep the node from working.  This doesn't prove that there is no way to use more resources, of course, but it indicates that it isn't a big issue as was thought above. Without evidence otherwise this still looks like it's jut not that interesting compared to the hundreds of other ways to make nodes waste resources.

Quote
I think it will be helpful to force the hacker to work more on its request rather than randomly supply a nonsense stream of bits.
It does not require "work" to begin requested numbers with 64 bits of zeros.

Quote
Yet, I believe lock is not hold when a block is to be retrieved as a result of getblock, once it has been located, Right?
Sure it is, otherwise it could be pruned out from under the request.

Quote
So a getheaders request with more than 200-250  hashes as its payload is obviously malicious for the current height.
Who is missing something here?
If you come up and connect to malicious nodes, you can get fed a bogus low difficulty chain with a lot more height than the honest chain, and as a result produce larger locators without you being malicious at all. If peers ban you for that, you'll never converge back from the dummy chain.   Similarly, if you are offline a long time, and come back you'll expect a given number of items in the locator, but your peers-- far ahead on the real chain, will have more than you expected.   In both cases the simple "fix" creates a vulnerability, not the gravest of vulnerabilities, but the issue being fixed doesn't appear-- given testing so far-- especially interesting so the result would be making things worse,  

Quote
I suppose once a spv client has been fed by like 2000 block headers it should continue
This function doesn't have anything to do with SPV clients, in particular. It's how ordinary nodes reconcile their chains with each other. If locators were indeed a SPV only thing, then I agree that it would be easier to just stick arbitrary limits on them without worrying too much about creating other attacks.
aliashraf
Legendary
*
Offline Offline

Activity: 1456
Merit: 1174

Always remember the cause!


View Profile WWW
August 01, 2018, 07:00:24 PM
Last edit: August 02, 2018, 08:26:30 PM by aliashraf
 #20

As of your 200K loop, and 20% increase in cpu usage: It is huge, imo. With just a few malicious requests this node will be congested.
The patch I posted turns _every_ message into a "malicious message" and it only had that modest cpu impact, and didn't keep the node from working.
Yet, I'm not convinced that whether it receives enough messages or not. I mean you should simulate a DDoS situation to make further conclusions.

Plus, I think when it comes to simply improving the code and not in a trade-off situation always choosing optimised code is best practice. Your doubts are useful when we are about to cut or downgrade a service because of a hypothetical DoS vulnerability.
Quote
Quote
So a getheaders request with more than 200-250  hashes as its payload is obviously malicious for the current height.
Who is missing something here?
If you come up and connect to malicious nodes, you can get fed a bogus low difficulty chain with a lot more height than the honest chain, and as a result produce larger locators without you being malicious at all. If peers ban you for that, you'll never converge back from the dummy chain.   Similarly, if you are offline a long time, and come back you'll expect a given number of items in the locator, but your peers-- far ahead on the real chain, will have more than you expected.   In both cases the simple "fix" creates a vulnerability, not the gravest of vulnerabilities, but the issue being fixed doesn't appear-- given testing so far-- especially interesting so the result would be making things worse,  

Ok. No blacklisting, but we can simply reply with genesis block header (plus first 2000 headers)  immediately after the receipt of an unreasonably loaded block locator. This can be conventionally used by the client  as a sign of it being bootstrapped maliciously.  After all  it is what we do eventually, why not just take the shortcut instead of trying to locate  200K dummy  hashes.
Deal?

As of honest nodes being offline for a long time , I suppose they should get synched before being a useful peer.  And we are talking about a logarithmic function, for the network to have 300 legitimate hashes in block locator we should reach to a block height of 1,000,000,000 or so. It is going to happen like 20 centuries later.

Quote
This function doesn't have anything to do with SPV clients, in particular. It's how ordinary nodes reconcile their chains with each other. If locators were indeed a SPV only thing, then I agree that it would be easier to just stick arbitrary limits on them without worrying too much about creating other attacks.
Full nodes use getblock AFAIK to synch, as of getheaders which is our concern, bitcoin wiki suggests:
The getheaders command is used by thin clients to quickly download the block chain where the contents of the transactions would be irrelevant
Hence, if a thin client resubmits bogus block locators in a row, it could be blacklisted safely.

Another important measure for both full and thin clients would be checking the maximum block height with their own clock not to be unreasonably divergent.
This way no client would have an excuse like 'I thought we have multiple billions of billions blocks so I filled such a lengthy block locator innocently'.
gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 4172
Merit: 8420



View Profile WWW
August 02, 2018, 09:25:26 PM
Last edit: August 05, 2018, 05:01:36 PM by gmaxwell
 #21

Full nodes use getblock AFAIK to synch, as of getheaders
Getblock is only used on known header connected candidate best chains, for many years now. In other words nodes will never speculatively request blocks anymore, they only request blocks that would be the best chain per the headers assuming the blocks are valid. (moreover the behavior being discussed is the same for both messages).
aliashraf
Legendary
*
Offline Offline

Activity: 1456
Merit: 1174

Always remember the cause!


View Profile WWW
August 03, 2018, 06:44:50 AM
Last edit: August 03, 2018, 05:09:45 PM by aliashraf
 #22

A conclusive proposal for this issue:

1- I suggest that when there is no drawback, choosing more optimized code is best practice. For this case, improving the code can be done without such a draw back and is recommended.

2- The length of a legit block locator is always less than or equal to
L=10*ceil(log2N)
where N is the max (feasible) block height for the chain. N can conventionally be computed like this:  

N = (timestamp_of_ genesis_block - current_system_clock) /1000/60/10 + 60480  
Here N is an upper bound for any suggested fork of blockchain (60480 is used for 30 consecutive difficulty adjustments, like 14 months)

My intuitive assumption is that no legit node will suggest/accept forks with a chain much longer than what 10 minutes block time imposes. Although the longest chain rule is to be used when nodes are deciding between forks, there should be an upper bound for circumstances in which this policy is applicable, and should be dropped when a suggested fork length exceeds that upper bound.

One may theoretically find it feasible to have a chain relatively  longer than expectation because difficulty adjustment may fail to stabilize network block production rate but it would be temporary and flow of mining power can not continue to grow exponentially.

IOW, each client should be able to calculate an upper bound for the number of blocks present in the chain they are committed to and hence an upper bound for the number of hashes present in sent/received block locators, regardless of how fair it has been bootstrapped.

3- Any client that sends unreasonable number of hashes in block locator should be considered an attack agent and treated properly. No honest client is allowed to accept/relay a fork with unreasonable height.


gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 4172
Merit: 8420



View Profile WWW
August 03, 2018, 10:55:52 PM
Merited by Foxpup (3)
 #23

My intuitive assumption is that no legit node will suggest/accept forks with a chain much longer than what 10 minutes block time imposes.
No existing software that I'm aware of meets your assumption, they absolutely will do so. It's not at all clear to me how one could go about constructing a rule to not do so that would be guaranteed to not create convergence failures at least in some maliciously created situations.

Quote
Although the longest chain rule is to be used when nodes are deciding between forks, there should be an upper bound for circumstances in which this policy is applicable, and should be dropped when a suggested fork length exceeds that upper bound.
I think you might have buried the headline a bit here: Your conclusive solution as suggested a reworking of the whole consensus algorithm to ignore its primary convergence criteria, and replace it with some kind of "ignore the longest chain under some circumstances" but you aren't even specifying which circumstances or the exact nature of the change... making it impossible to review in any detail.

I don't at doubt that _some_ reasonable ones could be cooked up...  but why would be worth the effort to review such a potentially risky change, when the thing being to fixd is merely "someone can use a lot of bandwidth to temporarily drive your CPU usage higher"?   -- especially when there are a lot of ways to do that, e.g. set a bloom filter to match almost nothing and scan a lot of blocks... that has very low bandwidth per CPU burned.

But also think your suggestion is overkill:  If instead there is a BIP that says that locators should never have more than 10+10*ceil(log2(expected blocks)), then the nodes creating locators can determine if their locators would be too big and instead make their points further apart to stay within the limit (and presumably one less than it, to deal with time-skew).  No change in consensus required, nodes can still accept all the blocks they want. If you are on some many-block crazy fork, you'll request a sparser locator, and as a result just waste a bit more bandwidth getting additional headers that you already know.  --- and that's what I was attempting to suggest upthread: that the safe change is to first get honest requesters to limit themselves; then later you can ignore requests that are too big.
aliashraf
Legendary
*
Offline Offline

Activity: 1456
Merit: 1174

Always remember the cause!


View Profile WWW
August 04, 2018, 12:54:44 PM
 #24

My intuitive assumption is that no legit node will suggest/accept forks with a chain much longer than what 10 minutes block time imposes.
No existing software that I'm aware of meets your assumption, they absolutely will do so. It's not at all clear to me how one could go about constructing a rule to not do so that would be guaranteed to not create convergence failures at least in some maliciously created situations.
I'm aware that no bitcoin client has implemented such a criteria to be met by forks (being in reasonable thresholds of longitude), they should, imo.

Quote
Quote
Although the longest chain rule is to be used when nodes are deciding between forks, there should be an upper bound for circumstances in which this policy is applicable, and should be dropped when a suggested fork length exceeds that upper bound.
I think you might have buried the headline a bit here: Your conclusive solution as suggested a reworking of the whole consensus algorithm to ignore its primary convergence criteria, and replace it with some kind of "ignore the longest chain under some circumstances" but you aren't even specifying which circumstances or the exact nature of the change... making it impossible to review in any detail.
Obviously, the criteria is:
Any fork, being the longest chain or not, is not allowed to exceed a computable threshold in length such that it would compromise the block generation rate imposed by other consensus rules.

Specifically, expected number of blocks should always be less than normal_number_of_ blocks_for_This_Time + a_safe_threshold

I've already suggested:
Quote
N = (timestamp_of_ genesis_block - current_system_clock) /1000/60/10 + 60480
 
Here N is an upper bound for any suggested fork of blockchain (60480 is used for 30 consecutive difficulty adjustments, like 14 months)

So we have a solid definition, provably safe and helpful in mitigating the DoS vulnerability under consideration here. I'm trying to understand why should we hesitate or use a more conservative approach like what you suggest:
Quote
If instead there is a BIP that says that locators should never have more than 10+10*ceil(log2(expected blocks)), then the nodes creating locators can determine if their locators would be too big and instead make their points further apart to stay within the limit (and presumably one less than it, to deal with time-skew).  No change in consensus required, nodes can still accept all the blocks they want. If you are on some many-block crazy fork, you'll request a sparser locator, and as a result just waste a bit more bandwidth getting additional headers that you already know.  --- and that's what I was attempting to suggest upthread: that the safe change is to first get honest requesters to limit themselves; then later you can ignore requests that are too big.
If the block locator rule is applicable, then one should have already calculated the #expected_blocks and he is primarily capable to avoid chains exceeding that threshold.

Quote

I don't at doubt that _some_ reasonable ones could be cooked up...  but why would be worth the effort to review such a potentially risky change, when the thing being to fixd is merely "someone can use a lot of bandwidth to temporarily drive your CPU usage higher"?   -- especially when there are a lot of ways to do that, e.g. set a bloom filter to match almost nothing and scan a lot of blocks... that has very low bandwidth per CPU burned.
I think the importance of the proposed criteria is more than just dealing with this specific DoS threat. Don't you?

I mean, messing with the holly Longest Chain Rule and putting a cap on it, is very inspiring.

I see a lot of potentials here.   Cool
For instance, one could investigate how it would help to resist long range and bootstrap attacks more efficiently or how narrow  the safe_threshold could be set, etc.
gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 4172
Merit: 8420



View Profile WWW
August 04, 2018, 10:11:31 PM
Last edit: August 04, 2018, 10:28:06 PM by gmaxwell
Merited by Foxpup (9), suchmoon (7)
 #25

Obviously, the criteria is:
Any fork, being the longest chain or not, is not allowed to exceed a computable threshold in length such that it would compromise the block generation rate imposed by other consensus rules.
Taken strictly that criteria is violated by the existing chain. In other words, if it were imposed previously the network would have already faulted. Block 0 is timestamp 2009-01-03 18:15:05, this was 3501 days ago, which implies ~504144 blocks. Yet we are on 535244.

Any increase in hashrate will result in that condition, implemented strictly, being violated because the difficulty adjustment is retrospective and only proportional (it includes no integral component).  If the condition is violated it can cause spontaneous consensus failure by forcing blocktimes against the maximum time skew and causing the network to break into islands based on small differences in their local clocks. You can add margin so that it hasn't failed _yet_ on the current chain but what reason do you have to expect that it wouldn't fail at some point in the future for any given selection of margin?  There is no amount which the chain is guaranteed by construction to not get ahead by. Quite the opposite, under the simple assumption that hashrate will increase over time the chain is guaranteed to get ahead.

Quote
So we have a solid definition, provably safe
As far as I can see your posts contain no such proof. In fact, the existing chain being well ahead of the metric now is concrete proof that some arbitrary safety margin values are not safe. What reason do you have to believe another arbitrary value is safe given that some are provably not, beyond "this setting hasn't failed with the historic chain yet"?

Had your same series of reasoning been followed in 2012-- and set a margin of 1000 blocks (or whatever was twice the adequate value at that point in time) then sometime in the next year after the hashrate grew tremendously the network would have spontaneously broken into multiple chains.

Quote
and helpful in mitigating the DoS vulnerability under consideration here
What DOS vulnerability? There isn't one as far as any measurements have thus far determined.  Changing unrelated consensus rules in ways which only seem to satisfy a "doesn't fail yet on the existing chain" level of proof to fix a maybe service could be denied here node behavior seems, frankly, absurd.  Additionally, it seems just kind of lazy to the point of reckless indifference:  "I can't be bothered to implement some P2P within the performance envelope I desire, though it clearly can be through purely local changes, so I'm going to insist that the core consensus algorithm be changed."

If your concern is that this might hold a lock too long (though measurement shows that a node keeps working fine even slamming 200k lookups on every network message, with the code previously posted on the thread) -- then just change the message handling to drop the blinking lock between groups of entries!   In what universe does it makes sense address a simple implementation programming question through speculative consensus rules with arbritary parameters that would provably have failed already for some choices of the arbitrary parameters?

Quote
I'm trying to understand why should we hesitate or use a more conservative approach like what you suggest
Because it's trivial to prove that its safe and doesn't require changes to the bitcoin consensus rules which might open the system up to attack or spontaneous failure and can't create new node isolation vulnerabilities, etc.  Why? because some of us don't think of Bitcoin as a pet science project, understand that its difficult to reason about the full extent of changes, and actually care if it survives.  If you share these positions your should reconsider your arguments, because I don't think they make sense in light of them.
aliashraf
Legendary
*
Offline Offline

Activity: 1456
Merit: 1174

Always remember the cause!


View Profile WWW
August 05, 2018, 12:35:18 AM
Last edit: August 05, 2018, 01:05:13 AM by aliashraf
 #26

Obviously, the criteria is:
Any fork, being the longest chain or not, is not allowed to exceed a computable threshold in length such that it would compromise the block generation rate imposed by other consensus rules.
Taken strictly that criteria is violated by the existing chain. In other words, if it were imposed previously the network would have already faulted. Block 0 is timestamp 2009-01-03 18:15:05, this was 3501 days ago, which implies ~504144 blocks. Yet we are on 535244.
According to my suggested threshold of 60480, it doesn't fail.

Quote
Any increase in hashrate will result in that condition, implemented strictly, being violated because the difficulty adjustment is retrospective and only proportional (it includes no integral component).  If the condition is violated it can cause spontaneous consensus failure by forcing blocktimes against the maximum time skew and causing the network to break into islands based on small differences in their local clocks. You can add margin so that it hasn't failed _yet_ on the current chain but what reason do you have to expect that it wouldn't fail at some point in the future for any given selection of margin?  There is no amount which the chain is guaranteed by construction to not get ahead by. Quite the opposite, under the simple assumption that hashrate will increase over time the chain is guaranteed to get ahead.
In the last decade, starting from zero and reaching to 45o000 penta hash, experiencing GPU and FPGA and ASIC one after another and we got just 35000 more blocks. In the next decade, with such a hash power as the inertia, we don't expect even half of such a deviation, and more importantly it won't be just a jump like in few days or weeks, it would be incremental and observable and a simple increase in the threshold would be enough to resolve the problems you are worried about.
Quote
Quote
So we have a solid definition, provably safe
As far as I can see your posts contain no such proof. In fact, the existing chain being well ahead of the metric now is concrete proof that some arbitrary safety margin values are not safe. What reason do you have to believe another arbitrary value is safe given that some are provably not, beyond "this setting hasn't failed with the historic chain yet"?
It takes some time for me to prepare a formal mathematical proof that there is always a safe threshold for a large enough duration of time.

Quote
Had your same series of reasoning been followed in 2012-- and set a margin of 1000 blocks (or whatever was twice the adequate value at that point in time) then sometime in the next year after the hashrate grew tremendously the network would have spontaneously broken into multiple chains.
The suggested threshold of 60480 is not calculated as a fraction of current block height, it is the number of blocks in 30 consecutive difficulty adjustments. So in 2012 I'd suggest the same threshold.

Quote

Quote
and helpful in mitigating the DoS vulnerability under consideration here
What DOS vulnerability? There isn't one as far as any measurements have thus far determined.  Changing unrelated consensus rules in ways which only seem to satisfy a "doesn't fail yet on the existing chain" level of proof to fix a maybe service could be denied here node behavior seems, frankly, absurd.  Additionally, it seems just kind of lazy to the point of reckless indifference:  "I can't be bothered to implement some P2P within the performance envelope I desire, though it clearly can be through purely local changes, so I'm going to insist that the core consensus algorithm be changed."
It is too much!  Cheesy
No, it is not a lazy approach, whatever. It is about stopping clients from pretending to be honest and innocently following a bogus chain when they are attempting to query a hypothetical chain with one billion  height when we expect just 500-600 thousands. That simple.

Quote
In what universe does it makes sense address a simple implementation programming question through speculative consensus rules with arbritary parameters that would provably have failed already for some choices of the arbitrary parameters?
In my universe, every rule has a criteria to met and it is a loose definition to say: the longer chain is always the better chain. I'll set a criteria on this rule: as long as its height is not obviously bogus.  

Quote
... some of us don't think of Bitcoin as a pet science project, understand that its difficult to reason about the full extent of changes, and actually care if it survives.  If you share these positions your should reconsider your arguments, because I don't think they make sense in light of them.
I don't agree. I'll provide proof that there exists a safe threshold for a long enough period that the block chain won't break through it and it is getting smaller in  time because of the inertia of the installed hash power.

I suppose with such a proof, it is ok to put a cap on LCR, right?
gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 4172
Merit: 8420



View Profile WWW
August 05, 2018, 01:58:35 AM
Merited by Foxpup (3)
 #27

According to my suggested threshold of 60480, it doesn't fail.
No but it would fail the _existing_ chain with a threshold of more than half that one. So your adhoc rule doesn't even have a safety margin of 2 against the existing chain-- much less against what is reasonably possible.  If hashrate grows quickly this threshold will be exceeded.  There is no constant threshold where large hashrate growth cannot exceed it.

Quote
No, it is not a lazy approach, whatever.

I don't find 'whatever' to be much of an argument. It still sounds to me that you think the whole network should potentially dangerous changes to the consensus rules in order to avoid having to correctly process a megabyte of network data without undue load. How isn't that lazy?
aliashraf
Legendary
*
Offline Offline

Activity: 1456
Merit: 1174

Always remember the cause!


View Profile WWW
August 05, 2018, 01:37:29 PM
Last edit: August 06, 2018, 11:54:26 AM by aliashraf
 #28

According to my suggested threshold of 60480, it doesn't fail.
No but it would fail the _existing_ chain with a threshold of more than half that one. So your adhoc rule doesn't even have a safety margin of 2 against the existing chain-- much less against what is reasonably possible.  
Once it is going to be merged, we would adjust the threshold for current block height such that from that moment we would have the required 30 epochs (difficulty adjustments) ahead.

Quote
If hashrate grows quickly this threshold will be exceeded.  There is no constant threshold where large hashrate growth cannot exceed it.
You are wrong here. With just a minor tweak in the algorithm, it is provably safe even for much smaller thresholds. The only problem would be the bootstrap nightmare  for which I've proposed such a large window from the first place, buying time (like one year) for adjusting the parameter hardcoded in the source and promoting it. I'll thoroughly describe and demonstrate the validity of this claims within next few hours and after finishing my work.  

Quote
Quote
No, it is not a lazy approach, whatever.
I don't find 'whatever' to be much of an argument. It still sounds to me that you think the whole network should potentially dangerous changes to the consensus rules in order to avoid having to correctly process a megabyte of network data without undue load. How isn't that lazy?

As I see, you are super conservatist when it comes to messing with core algorithm. Good for you. I admit this is absolutely necessary most times but I afraid in this case it is not helping.

First of all, I'm not suggesting this change just for mitigating the weak (let's put it this way) DoS vulnerability that OP has pointed out. I'm now concerned more about much critical problems like malicious bootstrapping which you have used up-thread as a possibility for innocent clients who should not be isolated because of this, remember?

Plus, I'm confident this policy, putting a cap on LCR, is good design pattern and will help securing the network efficiently as a general improvement in modelling the integrity of the protocol and 'discovering' a critical relationship between the longest chain rule  and other components of the protocol.

Back to conservatism, your alternative proposal for mitigating the (weak) vulnerability under consideration is not inspiring at all. I think the price we should pay for bitcoin being an operational network has not to be that high to become superstitious about touching 'sensitive parts'.

Anyway, I'll submit a post covering both a more refined algorithm and a comprehensive proof of safety and applicability of it,  in few hours as I promised, and I suppose  it would be of much fun discussing it  Wink

EDIT: I just realized it to be more appropriate to do it in a more formal way. My apologies for the delay and will do my best to compensate with a thoughtful article asap.
Coinr8d (OP)
Newbie
*
Offline Offline

Activity: 13
Merit: 17


View Profile
August 11, 2018, 07:39:37 AM
 #29

I'm sorry I was not yet able to provide the data. It would take me little more time to get to this again, but I can see Greg published a proposal and also implemented a fix that was already merged into bitcoin core. Thus it feels not so important any more to provide the data as the problem, if there was one, has been solved.

Thank you Greg for your time you put into this.

I will thus only try to provide the data later if someone requests it again.
aliashraf
Legendary
*
Offline Offline

Activity: 1456
Merit: 1174

Always remember the cause!


View Profile WWW
August 12, 2018, 03:04:36 PM
 #30

It is good to see, @gmaxwell has made something out of this discussion. He is of a pragmatist type.  Cool

Noticing that the pull request is a bit confused about the block locator maximum size, I made a comment. I'm now busy solving a more general problem: What's the maximum feasible length of a blockchain at any specific time?

Although the existence of such a boundary is intuitively obvious, and I have already progressed a lot, the proof takes a bit more work to be formalized.  Wink
aliashraf
Legendary
*
Offline Offline

Activity: 1456
Merit: 1174

Always remember the cause!


View Profile WWW
August 16, 2018, 11:08:26 AM
Last edit: August 17, 2018, 09:27:12 PM by aliashraf
 #31

I just published my work on blockchain height divergence issue on a separate thread. I hope it would help to have a relatively better understanding of the subject and problems like what we have been discussing here in this topic.  Smiley
poblico
Member
**
Offline Offline

Activity: 75
Merit: 10


View Profile
August 17, 2018, 03:53:03 PM
 #32

I'm sorry I was not yet able to provide the data. It would take me little more time to get to this again, but I can see Greg published a proposal and also implemented a fix that was already merged into bitcoin core. Thus it feels not so important any more to provide the data as the problem, if there was one, has been solved.

Thank you Greg for your time you put into this.

I will thus only try to provide the data later if someone requests it again.


FYI this was fixed in this PR
https://github.com/bitcoin/bitcoin/pull/13907
Pages: 1 2 [All]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!