Bitcoin Forum
April 27, 2024, 05:32:17 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: Improving the auto-linker (SMF patch)  (Read 184 times)
PowerGlove (OP)
Hero Member
*****
hacker
Offline Offline

Activity: 508
Merit: 3974



View Profile
September 07, 2023, 03:26:56 PM
Merited by theymos (20), hosseinimr93 (10), EFS (8), dbshck (8), ABCbits (7), LoyceV (6), Mitchell (5), DdmrDdmr (4), TryNinja (3), Pmalek (3), Husna QA (3), Cyrus (2), ibminer (2), joker_josue (2), Rizzrack (2), un_rank (2), Mahdirakib (1), Rikafip (1), BenCodie (1)
 #1

There was a recent Meta thread about the auto-linker sometimes failing to properly recognize URLs, and my name came up, so I decided to poke around and see if I could make sense of this bug.

As a recap, the auto-linker can sometimes be confused by leading spaces (particularly after a post has been edited, or quoted). For example, if you post the following (a sequence of URLs with an increasing amount of leading space, meant to showcase the problem):

Code:
www.thefarside.com
 www.thefarside.com
  www.thefarside.com
   www.thefarside.com
    www.thefarside.com
     www.thefarside.com
      www.thefarside.com
       www.thefarside.com

Then it'll (initially) render correctly, like this:



But after an edit (even one that doesn't change anything), it'll render incorrectly, like this (i.e. links with 2/4/6 leading spaces no longer recognized):



If the original post is quoted, then it'll render like this (i.e. links with 3/5/7 leading spaces no longer recognized):



(And if the quoted post were edited, it would revert to links with 2/4/6 leading spaces no longer being recognized.)

Pretty weird, huh?

Now, I know there are a few places in SMF where whitespace conversions happen (that's part of the reason I did the [nbsp] patch, so that non-breaking spaces could be used in a way that wouldn't be undone by those conversions). So, I don't find this bug that perplexing (though, I was surprised that the bug persisted even after bypassing preparsecode() and un_preparsecode(); I had figured that something in one of those two functions was behind spacing not "round-tripping" correctly on SMF).

Anyway, regardless of the ultimate source(s) of spacing getting silently messed with when you edit (or quote) a post, this particular bug is caused by the URL regexes in the auto-linker not properly taking this state of affairs into account (which is odd, because the e-mail regexes do). Specifically, the positive lookbehind assertions aren't aware of non-breaking spaces (and the second regex, the one for schemeless URLs, needs an additional tweak in order to prevent this bug from sometimes presenting during post preview).

Here's the diff for @theymos:

Code:
--- baseline/Sources/Subs.php	2011-09-17 21:59:55.000000000 +0000
+++ modified/Sources/Subs.php 2023-09-07 15:04:45.000000000 +0000
@@ -1820,36 +1820,39 @@
 
  // Don't go backwards.
  //!!! Don't think is the real solution....
  $lastAutoPos = isset($lastAutoPos) ? $lastAutoPos : 0;
  if ($pos < $lastAutoPos)
  $no_autolink_area = true;
  $lastAutoPos = $pos;
 
  if (!$no_autolink_area)
  {
  // Parse any URLs.... have to get rid of the @ problems some things cause... stupid email addresses.
  if (!isset($disabled['url']) && (strpos($data, '://') !== false || strpos($data, 'www.') !== false))
  {
  // Switch out quotes really quick because they can cause problems.
  $data = strtr($data, array('&#039;' => '\'', '&nbsp;' => $context['utf8'] ? "\xC2\xA0" : "\xA0", '&quot;' => '>">', '"' => '<"<', '&lt;' => '<lt<'));
 
+ // Can't make use of $non_breaking_space in the URL regexes (that definition won't work without the "u" modifier).
+ $nbsp = $context['utf8'] ? '\xc2\xa0' : '\xa0';
+
  // Only do this if the preg survives.
  if (is_string($result = preg_replace(array(
- '~(?<=[\s>\.(;\'"]|^)((?:http|https|ftp|ftps)://[\w\-_%@:|]+(?:\.[\w\-_%]+)*(?::\d+)?(?:/[\w\-_\~%\.@,\?&;=#(){}+:\'\\\\]*)*[/\w\-_\~%@\?;=#}\\\\])~i',
- '~(?<=[\s>(\'<]|^)(www(?:\.[\w\-_]+)+(?::\d+)?(?:/[\w\-_\~%\.@,\?&;=#(){}+:\'\\\\]*)*[/\w\-_\~%@\?;=#}\\\\])~i'
+ '~(?<=[\s>\.(;\'"]|' . $nbsp . '|^)((?:http|https|ftp|ftps)://[\w\-_%@:|]+(?:\.[\w\-_%]+)*(?::\d+)?(?:/[\w\-_\~%\.@,\?&;=#(){}+:\'\\\\]*)*[/\w\-_\~%@\?;=#}\\\\])~i',
+ '~(?<=[\s>(;\'<]|' . $nbsp . '|^)(www(?:\.[\w\-_]+)+(?::\d+)?(?:/[\w\-_\~%\.@,\?&;=#(){}+:\'\\\\]*)*[/\w\-_\~%@\?;=#}\\\\])~i'
  ), array(
  '[url]$1[/url]',
  '[url=http://$1]$1[/url]'
  ), $data)))
  $data = $result;
 
  $data = strtr($data, array('\'' => '&#039;', $context['utf8'] ? "\xC2\xA0" : "\xA0" => '&nbsp;', '>">' => '&quot;', '<"<' => '"', '<lt<' => '&lt;'));
  }
 
  // Next, emails...
  if (!isset($disabled['email']) && strpos($data, '@') !== false)
  {
  $data = preg_replace('~(?<=[\?\s' . $non_breaking_space . '\[\]()*\\\;>]|^)([\w\-\.]{1,80}@[\w\-]+\.[\w\-\.]+[\w\-])(?=[?,\s' . $non_breaking_space . '\[\]()*\\\]|$|<br />|&nbsp;|&gt;|&lt;|&quot;|&#039;|\.(?:\.|;|&nbsp;|\s|$|<br />))~' . ($context['utf8'] ? 'u' : ''), '[email]$1[/email]', $data);
  $data = preg_replace('~(?<=<br />)([\w\-\.]{1,80}@[\w\-]+\.[\w\-\.]+[\w\-])(?=[?\.,;\s' . $non_breaking_space . '\[\]()*\\\]|$|<br />|&nbsp;|&gt;|&lt;|&quot;|&#039;)~' . ($context['utf8'] ? 'u' : ''), '[email]$1[/email]', $data);
  }
  }

(Because this patch amounts to adjusting a pair of regexes in the BBCode parser, it will both fix this bug moving forward, and retroactively fix old posts that have unclickable links in them due to this issue, like this one.)
"Your bitcoin is secured in a way that is physically impossible for others to access, no matter for what reason, no matter how good the excuse, no matter a majority of miners, no matter what." -- Greg Maxwell
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1714239137
Hero Member
*
Offline Offline

Posts: 1714239137

View Profile Personal Message (Offline)

Ignore
1714239137
Reply with quote  #2

1714239137
Report to moderator
theymos
Administrator
Legendary
*
Offline Offline

Activity: 5180
Merit: 12900


View Profile
September 08, 2023, 07:48:01 PM
Merited by PowerGlove (10), EFS (2), Pmalek (2), Husna QA (1)
 #2

Done, thanks! What a monstrous regex...

I'm 95% sure that this change is correct, but if anyone notices this breaking any posts, let me know.

1NXYoJ5xU91Jp83XfVMHwwTUyZFK64BoAD
Pmalek
Legendary
*
Offline Offline

Activity: 2744
Merit: 7109



View Profile
September 09, 2023, 08:57:43 AM
Merited by PowerGlove (1)
 #3

I am the one who opened that thread in Meta you are talking about OP. For testing purposes, I am going to link to it here with different spaces before the link to see if the bug has been fixed. I will also edit my post once without making any changes and then a second time with just a minor change to see if that affects anything.

Edit 2:

Testing if the bug is gone https://bitcointalk.org/index.php?topic=5465210.0
Testing if the bug is gone  https://bitcointalk.org/index.php?topic=5465210.0
Testing if the bug is gone   https://bitcointalk.org/index.php?topic=5465210.0
Testing if the bug is gone    https://bitcointalk.org/index.php?topic=5465210.0

Edit 3 and 4: It works

.
.BLACKJACK ♠ FUN.
█████████
██████████████
████████████
█████████████████
████████████████▄▄
░█████████████▀░▀▀
██████████████████
░██████████████
████████████████
░██████████████
████████████
███████████████░██
██████████
CRYPTO CASINO &
SPORTS BETTING
▄▄███████▄▄
▄███████████████▄
███████████████████
█████████████████████
███████████████████████
█████████████████████████
█████████████████████████
█████████████████████████
███████████████████████
█████████████████████
███████████████████
▀███████████████▀
█████████
.
cafter
Full Member
***
Offline Offline

Activity: 448
Merit: 222



View Profile WWW
September 09, 2023, 09:56:04 AM
 #4

I come cross this bug from this thread after reading some replies i replied to this thread in bitcoin discussion board and the url is non clickable.
is this issue got not resolved yet?


joker_josue
Legendary
*
Online Online

Activity: 1638
Merit: 4534


**In BTC since 2013**


View Profile WWW
September 09, 2023, 01:33:47 PM
Merited by TryNinja (1), Husna QA (1)
 #5

Another great job @PowerGlove. Thanks!



I come cross this bug from this thread after reading some replies i replied to this thread in bitcoin discussion board and the url is non clickable.
is this issue got not resolved yet?



This is not a code issue or bug. The forum only recognizes links that start with http:// or www.
That is, if I write bitcointalk.org it does not create a link. But, if you write https://bitcointalk.org or www.bitcointalk.org it creates the link.

.
.HUGE.
▄██████████▄▄
▄█████████████████▄
▄█████████████████████▄
▄███████████████████████▄
▄█████████████████████████▄
███████▌██▌▐██▐██▐████▄███
████▐██▐████▌██▌██▌██▌██
█████▀███▀███▀▐██▐██▐█████

▀█████████████████████████▀

▀███████████████████████▀

▀█████████████████████▀

▀█████████████████▀

▀██████████▀▀
█▀▀▀▀











█▄▄▄▄
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
.
CASINSPORTSBOOK
▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
▀▀▀▀█











▄▄▄▄█
cafter
Full Member
***
Offline Offline

Activity: 448
Merit: 222



View Profile WWW
September 09, 2023, 01:41:38 PM
 #6

<snip>

Now i added "www." in beginning of the link and it became a nice clickable link. it was so confusing to understand what the exact problem was and what powerglove solved because i am not a coder or don't know much about technical things. thanks for clearing up the solution Smiley Smiley

Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!