Bitcoin Forum
October 17, 2017, 10:33:25 PM *
News: Latest stable version of Bitcoin Core: 0.15.0.1  [Torrent]. (New!)
 
   Home   Help Search Donate Login Register  
Pages: [1]
  Print  
Author Topic: Something odd  (Read 841 times)
dree12
Legendary
*
Offline Offline

Activity: 1246



View Profile
July 26, 2013, 09:16:53 PM
 #1

I was making some expansions to this thread recently. When I saved the post, it was cut off. However, the post is well under the 65535-character limit:

(firefox)
Code:
[17:16:06.713] post.length
[17:16:06.719] 62075

No notice came up; the post was just cut off. The post preview worked as expected.

Why, then, was the post cut off?
1508279605
Hero Member
*
Offline Offline

Posts: 1508279605

View Profile Personal Message (Offline)

Ignore
1508279605
Reply with quote  #2

1508279605
Report to moderator
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.
1508279605
Hero Member
*
Offline Offline

Posts: 1508279605

View Profile Personal Message (Offline)

Ignore
1508279605
Reply with quote  #2

1508279605
Report to moderator
1508279605
Hero Member
*
Offline Offline

Posts: 1508279605

View Profile Personal Message (Offline)

Ignore
1508279605
Reply with quote  #2

1508279605
Report to moderator
1508279605
Hero Member
*
Offline Offline

Posts: 1508279605

View Profile Personal Message (Offline)

Ignore
1508279605
Reply with quote  #2

1508279605
Report to moderator
theymos
Administrator
Legendary
*
Offline Offline

Activity: 2814


View Profile
July 26, 2013, 10:47:48 PM
 #2

There's a 65535-byte limit. Characters not in [a-zA-Z ] require ~6 bytes with SMF's encoding, including newlines.

There should be some sort of warning if you trigger this.

1NXYoJ5xU91Jp83XfVMHwwTUyZFK64BoAD
dree12
Legendary
*
Offline Offline

Activity: 1246



View Profile
July 26, 2013, 11:59:33 PM
 #3

There's a 65535-byte limit. Characters not in [a-zA-Z ] require ~6 bytes with SMF's encoding, including newlines.

There should be some sort of warning if you trigger this.

This seems absolutely ridiculous. UTF-8 has more characters that can be fit into a byte, and all Unicode characters can be encoded in at most 6 bytes. I assume my post is so large then, because of all the numbers, punctuation, and newlines.

Thank you for inserting my post, though.
theymos
Administrator
Legendary
*
Offline Offline

Activity: 2814


View Profile
July 27, 2013, 12:16:21 AM
 #4

SMF translates all special characters into HTML entities and all newlines into <br />s before inserting text into the database. This is maybe more efficient, though I probably wouldn't have done it this way.

1NXYoJ5xU91Jp83XfVMHwwTUyZFK64BoAD
nimda
Hero Member
*****
Offline Offline

Activity: 784


0xFB0D8D1534241423


View Profile
July 27, 2013, 12:18:26 AM
 #5

There's a 65535-byte limit. Characters not in [a-zA-Z ] require ~6 bytes with SMF's encoding, including newlines.

There should be some sort of warning if you trigger this.

This seems absolutely ridiculous. UTF-8 has more characters that can be fit into a byte, and all Unicode characters can be encoded in at most 6 bytes. I assume my post is so large then, because of all the numbers, punctuation, and newlines.

Thank you for inserting my post, though.
Remember, you're going into HTML. For example, an ampersand (one byte in UTF-8) must become "&amp;" (5 bytes) or "&#38;" (5 bytes). Newlines are encoded as "<br />" which is 6 bytes.

I recommend asking me for a signature from my GPG key before doing a trade. I will NEVER deny such a request.
dree12
Legendary
*
Offline Offline

Activity: 1246



View Profile
July 27, 2013, 12:38:22 AM
 #6

SMF translates all special characters into HTML entities and all newlines into <br />s before inserting text into the database. This is maybe more efficient, though I probably wouldn't have done it this way.

Are numbers translated too? Because if so, it would seem they are translated back...

Anyways, I guess that's reasonable. Personally, I would have taken a storage hit and stored both a BBCode version in UTF-8 and a cached HTML translation. It would be most efficient, speed-wise (one translation per edit, rather than multiple), and storage is quite cheap (especially for text). IIRC that's what Wikipedia does, and it's a major reason why they can serve so many people so quickly with very few servers.
theymos
Administrator
Legendary
*
Offline Offline

Activity: 2814


View Profile
July 27, 2013, 12:43:24 AM
 #7

No, numbers aren't translated.

1NXYoJ5xU91Jp83XfVMHwwTUyZFK64BoAD
btcton
Hero Member
*****
Offline Offline

Activity: 924

Professional SysAdmin / Hobbyist Developer


View Profile
July 27, 2013, 07:45:05 PM
 #8

SMF translates all special characters into HTML entities and all newlines into <br />s before inserting text into the database. This is maybe more efficient, though I probably wouldn't have done it this way.

Are numbers translated too? Because if so, it would seem they are translated back...

Anyways, I guess that's reasonable. Personally, I would have taken a storage hit and stored both a BBCode version in UTF-8 and a cached HTML translation. It would be most efficient, speed-wise (one translation per edit, rather than multiple), and storage is quite cheap (especially for text). IIRC that's what Wikipedia does, and it's a major reason why they can serve so many people so quickly with very few servers.
Only stuff that conflicts with HTML such as "<" or sometimes JavaScript need to be translated. Normal characters should be no more than one byte each.

Foxpup
Legendary
*
Offline Offline

Activity: 1988



View Profile
July 28, 2013, 03:32:13 AM
 #9

Only stuff that conflicts with HTML such as "<" or sometimes JavaScript need to be translated. Normal characters should be no more than one byte each.
The forum doesn't use Unicode. All non-ASCII characters must be converted to the corresponding HTML entity (eg, "©" becomes "&copy;" or "&#169;") in order to be displayed correctly. Without conversion, "©" will actually be displayed as "©". You've probably seen this before on sites that don't perform this conversion correctly.

Will pretend to do unverifiable things (while actually eating an enchilada-style burrito) for bitcoins: 1K6d1EviQKX3SVKjPYmJGyWBb1avbmCFM4
dree12
Legendary
*
Offline Offline

Activity: 1246



View Profile
July 28, 2013, 03:35:49 AM
 #10

Only stuff that conflicts with HTML such as "<" or sometimes JavaScript need to be translated. Normal characters should be no more than one byte each.
The forum doesn't use Unicode. All non-ASCII characters must be converted to the corresponding HTML entity (eg, "©" becomes "&copy;" or "&#169;") in order to be displayed correctly. Without conversion, "©" will actually be displayed as "©". You've probably seen this before on sites that don't perform this conversion correctly.

Unicode should be UTF-8. Just a minor correction, as the forum does indeed use Unicode, but cannot encode most Unicode characters.
Foxpup
Legendary
*
Offline Offline

Activity: 1988



View Profile
July 28, 2013, 03:49:33 AM
 #11

Unicode should be UTF-8. Just a minor correction, as the forum does indeed use Unicode, but cannot encode most Unicode characters.
The forum does not use UTF-8, or any other flavour of Unicode. It uses ISO-8859-1, or at least, that's how it serves its pages. If it actually does store posts in UTF-8 (or any other encoding), it would have to perform the conversion every time a page is requested, which seems rather wasteful.

Will pretend to do unverifiable things (while actually eating an enchilada-style burrito) for bitcoins: 1K6d1EviQKX3SVKjPYmJGyWBb1avbmCFM4
justusranvier
Legendary
*
Offline Offline

Activity: 1400



View Profile WWW
July 28, 2013, 04:02:24 AM
 #12

The forum does not use UTF-8, or any other flavour of Unicode. It uses ISO-8859-1, or at least, that's how it serves its pages.
Really? In 2013?
nimda
Hero Member
*****
Offline Offline

Activity: 784


0xFB0D8D1534241423


View Profile
July 28, 2013, 04:16:47 AM
 #13

The forum does not use UTF-8, or any other flavour of Unicode. It uses ISO-8859-1, or at least, that's how it serves its pages.
Really? In 2013?
Code:
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />

I recommend asking me for a signature from my GPG key before doing a trade. I will NEVER deny such a request.
btcton
Hero Member
*****
Offline Offline

Activity: 924

Professional SysAdmin / Hobbyist Developer


View Profile
July 28, 2013, 04:21:51 AM
 #14

Only stuff that conflicts with HTML such as "<" or sometimes JavaScript need to be translated. Normal characters should be no more than one byte each.
The forum doesn't use Unicode. All non-ASCII characters must be converted to the corresponding HTML entity (eg, "©" becomes "&copy;" or "&#169;") in order to be displayed correctly. Without conversion, "©" will actually be displayed as "©". You've probably seen this before on sites that don't perform this conversion correctly.
Oh, I see. That's weird, nowadays quite a few websites use Unicode.

Foxpup
Legendary
*
Offline Offline

Activity: 1988



View Profile
July 28, 2013, 04:34:47 AM
 #15

The forum does not use UTF-8, or any other flavour of Unicode. It uses ISO-8859-1, or at least, that's how it serves its pages.
Really? In 2013?
Why not? HTML itself only uses plain ASCII characters, and HTML entities allow any other character to be represented in ASCII text. You could encode a Chinese-Klingon dictionary in ASCII using HTML entities if you really wanted to, though it would take a whopping 8 bytes per character.

Will pretend to do unverifiable things (while actually eating an enchilada-style burrito) for bitcoins: 1K6d1EviQKX3SVKjPYmJGyWBb1avbmCFM4
dree12
Legendary
*
Offline Offline

Activity: 1246



View Profile
July 28, 2013, 02:23:56 PM
 #16

The forum does not use UTF-8, or any other flavour of Unicode. It uses ISO-8859-1, or at least, that's how it serves its pages.
Really? In 2013?
Why not? HTML itself only uses plain ASCII characters, and HTML entities allow any other character to be represented in ASCII text. You could encode a Chinese-Klingon dictionary in ASCII using HTML entities if you really wanted to, though it would take a whopping 8 bytes per character.

Again, its character encoding doesn't support Unicode, but the forum does use Unicode. HTML entities are a form of Unicode encoding too.
Pages: [1]
  Print  
 
Jump to:  

Sponsored by , a Bitcoin-accepting VPN.
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!