Title: Why is HTML being converted to the 0xA0 character instead of a space? Post by: theymos on March 21, 2013, 05:40:16 AM Look at this page:
https://bitcointalk.org/test.php The form is pre-filled with a . If I submit it, I get "a0", indicating that the browser sent the special 0xA0 "non-breaking space" character instead of a regular space. This isn't normal, and it's causing several problems for the forum software. This behavior started when I upgraded php and switched to nginx before switching servers, and it's persisted after switching servers. So it's probably some problem with the configuration of nginx or php. Any ideas on how to fix this? The code for test.php: Code: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> Title: Re: Why is HTML being converted to the 0xA0 character instead of a space? Post by: gweedo on March 21, 2013, 05:48:36 AM I am using php 5.3 on apache on mac OSX 10.8 (dev server), and just tried that snippet it gives me the same thing... so this is probably a php 5+ problem.
Title: Re: Why is HTML being converted to the 0xA0 character instead of a space? Post by: gweedo on March 21, 2013, 05:51:09 AM I just tried
Code: <!DOCTYPE HTML> which is HTML 5 and I got c2a0 Title: Re: Why is HTML being converted to the 0xA0 character instead of a space? Post by: theymos on March 21, 2013, 06:05:01 AM I looked at the XHTML/HTML standards, and actually is defined as being 0xA0. But I'm pretty sure that my browser didn't used to behave this way...
Title: Re: Why is HTML being converted to the 0xA0 character instead of a space? Post by: theymos on March 21, 2013, 06:18:49 AM This causes problems because SMF converts multiple spaces into a series of entities. So " " becomes " ". When the entities are converted to 0xA0, it causes at least these problems:
- When quoting a PM which has multiple spaces, you will end up submitting a message containing 0xA0 characters. This confuses the forum's mail processing and the recipient ends up being sent a PM notification email with an empty message. If you receive a lot of PMs you've probably noticed this. - There is no way for the forum to correctly display a clearsigned document if it has multiple spaces, even with [code] tags. Copying it will copy the non-breaking spaces (though only on some systems, I think) and it won't sign/verify consistently. - I can't use the forum's file browser because submitting a file with any 0xA0s messes things up for some reason. Title: Re: Why is HTML being converted to the 0xA0 character instead of a space? Post by: chmod755 on March 21, 2013, 07:07:26 AM Did you try the same
Title: Re: Why is HTML being converted to the 0xA0 character instead of a space? Post by: 🏰 TradeFortress 🏰 on March 21, 2013, 08:18:02 AM Did you try the same It's intended behavior (nbsp -> 0xA0), but SMF doesn't like it.
[/list] What about removing SMF's multiple space to   conversion? Title: Re: Why is HTML being converted to the 0xA0 character instead of a space? Post by: davout on March 21, 2013, 08:22:36 AM nbsp -> non-breakable space oO
That's like... by design Title: Re: Why is HTML being converted to the 0xA0 character instead of a space? Post by: Bitsky on March 21, 2013, 06:36:55 PM It's send like that from the browser. Just run a tcpdump and you'll see.
Firefox/Opera sends the POST request string "test=%C2%A0", while curl sends "test= ". http://en.wikipedia.org/wiki/Non-breaking_space#Encodings Title: Re: Why is HTML being converted to the 0xA0 character instead of a space? Post by: Foxpup on March 22, 2013, 10:17:09 AM This is exactly what's supposed to happen. The default value is a non-breaking space, so the browser sends a non-breaking space. What did you expect? No browser will send a regular space in this situation. Perhaps something in the old software was silently converting non-breaking spaces to regular spaces? Otherwise it should never have worked in the first place if non-breaking spaces are such a problem.
Title: Re: Why is HTML being converted to the 0xA0 character instead of a space? Post by: theymos on March 23, 2013, 04:01:02 AM Ha! I figured out what causes most of these problems. (Though I guess has been the special non-breaking space character in all browsers for at least several years.)
If omitted, the default value for this argument is ISO-8859-1 in versions of PHP prior to 5.4.0, and UTF-8 from PHP 5.4.0 onwards. With PHP < 5.4.0, 0xA0 was passed through normally. Now, this character is considered invalid UTF-8 and the entire input to htmlspecialchars is scrapped. This needs to be fixed in SMF. It affects even 2.x. I fixed it for the PM emails, but it'd be too much work to fix it everywhere. I'm getting pretty sick of dealing with SMF's escaping insanity... Who thought it was a good idea to have every function take strings with different degrees of escaping? |