Bitcoin Forum
September 15, 2025, 10:27:38 PM *
News: Latest Bitcoin Core release: 29.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: Improving the word-breaker (SMF patch)  (Read 219 times)
PowerGlove (OP)
Hero Member
*****
hacker
Offline Offline

Activity: 669
Merit: 6428



View Profile
February 17, 2025, 02:54:32 AM
Merited by LoyceV (42), dkbit98 (15), vapourminer (12), ABCbits (9), Cyrus (1), Lafu (1), BlackBoss_ (1)
 #1

I ran into a nice suggestion from a long time ago (~12 years):

SMF breaks up long words by inserting a space every 79 characters (it is a space in a <span> with a negative margin). Example: here are 120 'a' characters:

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

When copying/pasting the chars, the space is visible at the 80th position, which is very annoying...

Instead, SMF should insert the standardized <wbr> tag (word break opportunity) already recognized by most browsers. In theory <wbr> is identical to U+200B (ZERO-WIDTH SPACE) but this is false; for example the current Chrome version on Linux (Version 23.0.1271.97) replaces U+200B with '#' when copying & pasting to a non-UTF8 application, whereas <wbr> is nicely invisible...

Maybe newer members haven't encountered this particular issue before, but I know exactly what the above poster is complaining about, and I'm sure that many other people do, too.

Basically, if you (with some exceptions that aren't worth getting into) post an unbroken sequence of 80 or more letters, digits, underscores and/or periods, then it'll automatically get divided into chunks which are each 79 characters in length (or less, in the case of the final chunk).

So, for example, if you were to post the SHA-512 of the text "How would you like to suck my balls, Mr. Garrison?", then instead of it appearing like this:



It would appear like this:



(See that tiny space at around the two-thirds point? Between dbcc and 9e06?)

That "breaker" space will, among other things, cause problems with copy-pasting (because there's now a space in the content that the author didn't intend for there to be), and will also affect double-click selecting, like this:



So, what I've done is add a new kind of "breaker" to SMF (to go along with the three existing variations that are chosen between based on what browser the BBCode parser thinks it's producing markup for). This new breaker avoids the problems described above and is used by default (but, I've left the older breakers accessible by a context variable in case theymos wishes to selectively flip between the new and the old behavior).

Here's the diff:

Code:
--- baseline/Sources/Subs.php	2011-09-17 21:59:55.000000000 +0000
+++ modified/Sources/Subs.php 2025-02-16 23:39:26.000000000 +0000
@@ -1860,24 +1860,27 @@
  if (!empty($modSettings['fixLongWords']) && $modSettings['fixLongWords'] > 5)
  {
  // This is SADLY and INCREDIBLY browser dependent.
  if ($context['browser']['is_gecko'] || $context['browser']['is_konqueror'])
  $breaker = '<span style="margin: 0 -0.5ex 0 0;"> </span>';
  // Opera...
  elseif ($context['browser']['is_opera'])
  $breaker = '<span style="margin: 0 -0.65ex 0 -1px;"> </span>';
  // Internet Explorer...
  else
  $breaker = '<span style="width: 0; margin: 0 -0.6ex 0 -1px;"> </span>';
 
+ if ($context['bbc_use_modern_breaker'] ?? true)
+ $breaker = '<wbr />';
+
  // PCRE will not be happy if we don't give it a short.
  $modSettings['fixLongWords'] = (int) min(65535, $modSettings['fixLongWords']);
 
  // The idea is, find words xx long, and then replace them with xx + space + more.
  if (strlen($data) > $modSettings['fixLongWords'])
  {
  // This is done in a roundabout way because $breaker has "long words" :P.
  $data = strtr($data, array($breaker => '< >', '&nbsp;' => $context['utf8'] ? "\xC2\xA0" : "\xA0"));
  $data = preg_replace(
  '~(?<=[>;:!? ' . $non_breaking_space . '\]()]|^)([\w\.]{' . $modSettings['fixLongWords'] . ',})~e' . ($context['utf8'] ? 'u' : ''),
  "preg_replace('/(.{" . ($modSettings['fixLongWords'] - 1) . '})/' . ($context['utf8'] ? 'u' : '') . "', '\\\$1< >', '\$1')",
  $data);



When doing these sorts of fixes, I always hem and haw on whether or not it makes sense to keep using the older behavior on old posts, and only use the newer behavior on new posts...

I guess, I don't much like the idea of posts changing in ways that the author didn't account for (for example, I hate that one of my own posts was mangled by the wordfilter at some point after I authored it). I (mostly) lean toward wanting to keep old posts displaying as they did at the time they were authored. For example, in the post I quoted from above, mrb is demonstrating the problem he's describing, as in, if you try to copy-paste his example-sequence, then you'll get the result he's talking about: 79 characters, followed by a space, followed by 41 characters. But, after this fix is applied, if you then tried to copy-paste mrb's example, you'll find that you get 120 contiguous characters, so it'll look (from the perspective of someone reading that post cold) like mrb must have been confused or mistaken when he constructed that example.

I dunno, maybe I'm just overthinking things, but, in case theymos feels that the old behavior is worth preserving, here are two additional diffs that will make sure (at least, in the two places that are important, I think) that old posts won't be affected by this fix:

Code:
--- baseline/Sources/Display.php	2011-02-07 16:45:09.000000000 +0000
+++ modified/Sources/Display.php 2025-02-17 01:11:58.000000000 +0000
@@ -878,24 +878,26 @@
  else
  {
  $memberContext[$message['ID_MEMBER']]['can_view_profile'] = allowedTo('profile_view_any') || ($message['ID_MEMBER'] == $ID_MEMBER && allowedTo('profile_view_own'));
  $memberContext[$message['ID_MEMBER']]['is_topic_starter'] = $message['ID_MEMBER'] == $context['topic_starter_id'];
  }
 
  $memberContext[$message['ID_MEMBER']]['ip'] = $message['posterIP'];
 
  // Do the censor thang.
  censorText($message['body']);
  censorText($message['subject']);
 
+ $context['bbc_use_modern_breaker'] = (int)$message['ID_MSG'] >= 65073000;
+
  // Run BBC interpreter on the message.
  $message['body'] = parse_bbc($message['body'], $message['smileysEnabled'], $message['ID_MSG']);
 
  // Compose the memory eat- I mean message array.
  $output = array(
  'attachment' => loadAttachmentContext($message['ID_MSG']),
  'alternate' => $counter % 2,
  'id' => $message['ID_MSG'],
  'href' => $scripturl . '?topic=' . $topic . '.msg' . $message['ID_MSG'] . '#msg' . $message['ID_MSG'],
  'link' => '<a href="' . $scripturl . '?topic=' . $topic . '.msg' . $message['ID_MSG'] . '#msg' . $message['ID_MSG'] . '">' . $message['subject'] . '</a>',
  'member' => &$memberContext[$message['ID_MEMBER']],
  'icon' => $message['icon'],

Code:
--- baseline/Sources/Profile.php	2013-10-21 19:01:11.000000000 +0000
+++ modified/Sources/Profile.php 2025-02-17 01:12:02.000000000 +0000
@@ -1479,24 +1479,26 @@
  }
 
  // Start counting at the number of the first message displayed.
  $counter = $reverse ? $context['start'] + $maxIndex + 1 : $context['start'];
  $context['posts'] = array();
  $board_ids = array('own' => array(), 'any' => array());
  while ($row = mysql_fetch_assoc($request))
  {
  // Censor....
  censorText($row['body']);
  censorText($row['subject']);
 
+ $context['bbc_use_modern_breaker'] = (int)$row['ID_MSG'] >= 65073000;
+
  // Do the code.
  $row['body'] = parse_bbc($row['body'], $row['smileysEnabled'], $row['ID_MSG']);
 
  // And the array...
  $context['posts'][$counter += $reverse ? -1 : 1] = array(
  'body' => $row['body'],
  'counter' => $counter,
  'category' => array(
  'name' => $row['cname'],
  'id' => $row['ID_CAT']
  ),
LoyceV
Legendary
*
Offline Offline

Activity: 3794
Merit: 19865


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
February 17, 2025, 09:26:05 AM
Merited by vapourminer (1), PowerGlove (1)
 #2

I often ran into this, for instance when copying public keys. My solution was to click quote, and copy it from the edit-window. To prevent this, code-tags work fine.
I guess it's one of those little problems I'm so used to, I didn't even realize it can be patched. So it would be good to fix Smiley

¡uʍop ǝpᴉsdn pɐǝɥ ɹnoʎ ɥʇᴉʍ ʎuunɟ ʞool no⅄
Cyrus
Ninja
Administrator
Legendary
*
Online Online

Activity: 4256
Merit: 3394



View Profile
March 27, 2025, 05:47:48 PM
Merited by vapourminer (1), PowerGlove (1)
 #3

Bumping this to confirm that in the account recovery process we bumped into this quite a few times. Thanks for bringing this back up and offering a fix!

joker_josue
Legendary
*
Offline Offline

Activity: 2142
Merit: 6249


**In BTC since 2013**


View Profile WWW
March 27, 2025, 07:59:29 PM
 #4

It turns out it was something I had never noticed before. Maybe because I've never had to select sentences of that length.

It makes perfect sense for this situation to be corrected. And as you mention very well, preventing it from being changed in old posts. What's done is done, let it go.

So a smart solution that would be a good addition to the forum.

 
.Winna.com..

░░░░░░░▄▀▀▀
░░


▐▌▐▌
▄▄▄▒▒▒▄▄▄
████████████
█████████████
███▀▀███▀

▄▄

██████████████
████████████▄
█████████████
███▄███▄█████▌
███▀▀█▀▀█████
████▀▀▀█████▌
████████████
█████████████
█████
▀▀▀██████

▄▄
THE ULTIMATE CRYPTO
...CASINO & SPORTSBOOK...
─────  ♦  ─────

▄▄██▄▄
▄▄████████▄▄
██████████████
████████████████
███████████████
████████████████
▀██████████████▀
▀██████████▀
▀████▀

▄▄▄▄

▄▄▀███▀▄▄
▄██████████▄
███████████
███▄▄
▄███▄▄▄███
████▀█████▀███
█████████████████
█████████████
▀███████████
▀▀█████▀▀

▄▄▄▄


.....INSTANT.....
WITHDRAWALS
 
...UP TO 30%...
LOSSBACK
 
 

   PLAY NOW   
shahzadafzal
Copper Member
Legendary
*
Offline Offline

Activity: 2002
Merit: 3165



View Profile
March 27, 2025, 08:37:05 PM
 #5

I dunno, maybe I'm just overthinking things, but, in case theymos feels that the old behavior is worth preserving, here are two additional diffs that will make sure (at least, in the two places that are important, I think) that old posts won't be affected by this fix:

Well, if you ask me, that's a bug, and you're fixing it. There shouldn't be any issue implementing this change, and I think it should be fixed in the old posts too. After all, you're just adjusting how it behaves on the front end without affecting the original posts.

On the other hand I guess that's why SMF have [ code ][/ code ] tag for such things.

███████████▄
████████▄▄██
█████████▀█
███████████▄███████▄
█████▄█▄██████████████
████▄█▀▄░█████▄████████
████▄███░████████████▀
████░█████░█████▀▄▄▄▄▄
█████░█
██░█████████▀▀
░▄█▀
███░░▀▀▀██████
▀███████▄█▀▀▀██████▀
░░████▄▀░▀▀▀▀████▀
 

█████████████████████████
████████████▀░░░▀▀▀▀█████
█████████▀▀▀█▄░░░░░░░████
████▀▀░░░░░░░█▄░▄░░░▐████
████▌░░░░▄░░░▐████░░▐███
█████░░░▄██▄░░██▀░░░█████
█████▌░░▀██▀░░▐▌░░░▐█████
██████░░░░▀░░░░█░░░▐█████
██████▌░░░░░░░░▐█▄▄██████
███████▄░░▄▄▄████████████
█████████████████████████

█████████████████████████
████████▀▀░░░░░▀▀████████
██████░░▄██▄░▄██▄░░██████
█████░░████▀░▀████░░█████
████░░░░▀▀░░░░░▀▀░░░░████
████░░▄██░░░░░░░██▄░░████
████░░████░░░░░████░░████
█████░░▀▀░▄███▄░▀▀░░████
██████░░░░▀███▀░░░░██████
████████▄▄░░░░░▄▄████████
█████████████████████████
.
...SOL.....USDT...
...FAST PAYOUTS...
...BTC...
...TON...
logfiles
Copper Member
Legendary
*
Online Online

Activity: 2464
Merit: 2100



View Profile WWW
March 31, 2025, 12:16:50 AM
 #6

Hey @PowerGlove

Sorry if this is dumb me asking about here since the cases might not be related. Sometimes there's a way long links get broken, especially those of transaction ID or spreadsheets

And example is this text of mine(I don't want to quote it because it will all fit in)

I just wanted to inform you that 150 USDT has been sent to the address ( TXID ---> https://tronscan.org/#/transaction/3beb8505a451a92a6470eabbf3b67afb4ae59d19a6c0cbea879210b3550a3e9a).

Why doesn't the whole link just drop into the next line if it doesn't fit in the line on top?

.
 betpanda.io 
 
ANONYMOUS & INSTANT
.......ONLINE CASINO.......
▄███████████████████████▄
█████████████████████████
█████████████████████████
████████▀▀▀▀▀▀███████████
████▀▀▀█░▀▀░░░░░░▄███████
████░▄▄█▄▄▀█▄░░░█▄░▄█████
████▀██▀░▄█▀░░░█▀░░██████
██████░░▄▀░░░░▐░░░▐█▄████
██████▄▄█░▀▀░░░█▄▄▄██████
█████████████████████████
█████████████████████████
█████████████████████████
▀███████████████████████▀
▄███████████████████████▄
█████████████████████████
██████████▀░░░▀██████████
█████████░░░░░░░█████████
███████░░░░░░░░░███████
████████░░░░░░░░░████████
█████████▄░░░░░▄█████████
███████▀▀▀█▄▄▄█▀▀▀███████
██████░░░░▄░▄░▄░░░░██████
██████░░░░█▀█▀█░░░░██████
██████░░░░░░░░░░░░░██████
█████████████████████████
▀███████████████████████▀
▄███████████████████████▄
█████████████████████████
██████████▀▀▀▀▀▀█████████
███████▀▀░░░░░░░░░███████
██████░░░░░░░░░░░░▀█████
██████░░░░░░░░░░░░░░▀████
██████▄░░░░░░▄▄░░░░░░████
████▀▀▀▀▀░░░█░░█░░░░░████
████░▀░▀░░░░░▀▀░░░░░█████
████░▀░▀▄░░░░░░▄▄▄▄██████
█████░▀░█████████████████
█████████████████████████
▀███████████████████████▀
.
SLOT GAMES
....SPORTS....
LIVE CASINO
▄░░▄█▄░░▄
▀█▀░▄▀▄░▀█▀
▄▄▄▄▄▄▄▄▄▄▄   
█████████████
█░░░░░░░░░░░█
█████████████

▄▀▄██▀▄▄▄▄▄███▄▀▄
▄▀▄█████▄██▄▀▄
▄▀▄▐▐▌▐▐▌▄▀▄
▄▀▄█▀██▀█▄▀▄
▄▀▄█████▀▄████▄▀▄
▀▄▀▄▀█████▀▄▀▄▀
▀▀▀▄█▀█▄▀▄▀▀

Regional Sponsor of the
Argentina National Team
LoyceV
Legendary
*
Offline Offline

Activity: 3794
Merit: 19865


Thick-Skinned Gang Leader and Golden Feather 2021


View Profile WWW
March 31, 2025, 08:19:42 AM
 #7

I just wanted to inform you that 150 USDT has been sent to the address ( TXID ---> https://tronscan.org/#/transaction/3beb8505a451a92a6470eabbf3b67afb4ae59d19a6c0cbea879210b3550a3e9a).
One fix would be to manually type URL tags, like 3beb8505a451a92a6470eabbf3b67afb4ae59d19a6c0cbea879210b3550a3e9a or tronscan.org.

Quote
Why doesn't the whole link just drop into the next line if it doesn't fit in the line on top?
If that's what you want, just type Enter before the link.

¡uʍop ǝpᴉsdn pɐǝɥ ɹnoʎ ɥʇᴉʍ ʎuunɟ ʞool no⅄
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!