Hey, have a pure RegExp solution that will work in most circumstances.
<?php
$regex = <<<'REGEXP'
@\[url= # Start of URL BBCode
( # Group 1
((?:https?:\/\/)?) # Group 2, capture protocol if it exists
([\da-z\.-]+) # Group 3, capture the hostname, without the TLD
\. # Need a period between the hostname and TLD
([a-z\.]{2,6}) # Group 4, TLD
((?:[\/\w \.-]*)*\/?) # Group 5, path
)
\]
\s*? # Multiline support
( # Group 6
(?!.*\3.+\4|.*\[img\].*) # Lookahead to check for the non-phishing domain and images
.*?
((?:https?:\/\/)?) # Group 7, phishing URL protocol
( # Group 8, phishing URL host
(?:[\da-z\.-] # Match any characters normally found in a URL
| # or
\[[^\]]+\] # Match any BBCode
| # or
[^\x00-\x7F])+ # Match any unicode characters
)
(?:\.|[^\x00-\x7F]+) # Need a period, but also look for unicode characters
([a-z\.]{2,6}) # TLD
((?:[\/\w \.-]*)*\/?) # Path
.*?
| # or
[^ ]+\[img\].*\[\/img\][^ ]+ # An image with anything other than space spaces surrounding it
)
\s*? # Find any whitespace inbetween
\[/url\] # End of URL BBCode
@xmi
REGEXP;
This will fail if any unicode characters are used inside of words, but other than that should be selective enough. Example of this failing at the bottom of the post.
Will also fail when replacing the legit URL's "." with " " (or any other non-alphanumeric, non-unicode character) and having a phishing site for the URL. Expanding the TLD sections to look for real world TLDs could fix this issue to an extent.
success | [url=http://phishing.com]http://safe-site.com/login.php[/url] | [url=http://phishing.com]http://phishing.com[/url] |
success | [url=phishing.com] safe-site.com[/url] | [url=phishing.com]phishing.com[/url] |
success | [url=http://phishing.com]http://safe-site.com/login.php[/url][nobbc]http://safe-site.com/login.php[/url][/nobbc] | [url=http://phishing.com]http://phishing.com[/url][nobbc]http://safe-site.com/login.php[/url][/nobbc] |
success | [url=http://phishing.com]Welcome to safe-site.com![/url] | [url=http://phishing.com]http://phishing.com[/url] |
success | [url=http://phishing.com]safe-site.com[/url] | [url=http://phishing.com]http://phishing.com[/url] |
success | [url=http://phishing.com][b]safe[/b]-site.com[/url] | [url=http://phishing.com]http://phishing.com[/url] |
success | [url=http://phishing.com]safe-site.io[/url] | [url=http://phishing.com]http://phishing.com[/url] |
success | [url=http://phishing.com]safe-site⠠com[/url] | [url=http://phishing.com]http://phishing.com[/url] |
success | [url=http://phishing.com]safe-site .com[/url] | [url=http://phishing.com]http://phishing.com[/url] |
success | [url=http://phishing.com]s a f e-s i t e.com[/url] | [url=http://phishing.com]http://phishing.com[/url] |
success | [url=http://safe-site.io]safe-site.com[/url] | [url=http://safe-site.io]http://safe-site.io[/url] |
success | [url=http://phishing.com]safe-site[img]http://asdf.com/period.png[/img]com[/url] | [url=http://phishing.com]http://phishing.com[/url] |
failure | [url=http://phishing.com]http://safe-site com[/url] | [url=http://phishing.com]http://safe-site com[/url] |
success | [url=http://safe-site.com]http://safe-site.com[/url] | [url=http://safe-site.com]http://safe-site.com[/url] |
success | [url=safe-site.com]safe-site.com[/url] | [url=safe-site.com]safe-site.com[/url] |
success | [url=http://safe-site.com]safe-site.com[/url] | [url=http://safe-site.com]safe-site.com[/url] |
success | [url=safe-site.com]http://safe-site.com[/url] | [url=safe-site.com]http://safe-site.com[/url] |
success | [url=http://safe-site.com][img]http://asdf.com/image.png[/img][/url] | [url=http://safe-site.com][img]http://asdf.com/image.png[/img][/url] |
success | [url=http://safe-site.com] [img]http://asdf.com/image.png[/img][/url] | [url=http://safe-site.com] [img]http://asdf.com/image.png[/img][/url] |
success | [url=http://safe-site.com]safe-site.com is a good site[/url] | [url=http://safe-site.com]safe-site.com is a good site[/url] |
success | [url=http://safe-site.com]Welcome to safe-site.com![/url] | [url=http://safe-site.com]Welcome to safe-site.com![/url] |
success | [url=http://safe-site.com]こんにちは。[/url] | [url=http://safe-site.com]こんにちは。[/url] |
success | | |
success | [url=http://safe-site.com]☺☺☺ Hello ☺☺☺[/url] | [url=http://safe-site.com]☺☺☺ Hello ☺☺☺[/url] |
failure | [url=http://safe-site.com]Hello☺World[/url] | [url=http://safe-site.com]http://safe-site.com[/url] |
success | [url=http://safe-site.com]Hello ☺ World[/url] | [url=http://safe-site.com]Hello ☺ World[/url] |