Bitcoin Forum
November 12, 2024, 12:19:41 AM *
News: Check out the artwork 1Dq created to commemorate this forum's 15th anniversary
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: Why is HTML   being converted to the 0xA0 character instead of a space?  (Read 4030 times)
theymos (OP)
Administrator
Legendary
*
Offline Offline

Activity: 5376
Merit: 13410


View Profile
March 21, 2013, 05:40:16 AM
 #1

Look at this page:
https://bitcointalk.org/test.php

The form is pre-filled with a  . If I submit it, I get "a0", indicating that the browser sent the special 0xA0 "non-breaking space" character instead of a regular space. This isn't normal, and it's causing several problems for the forum software. This behavior started when I upgraded php and switched to nginx before switching servers, and it's persisted after switching servers. So it's probably some problem with the configuration of nginx or php.

Any ideas on how to fix this?

The code for test.php:
Code:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"><head> <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />
<title>asdf</title>
</head>
<body>
<form action="" method="post">
<input name="test" type="text" value="&nbsp;" />
<input type="submit" />
</form>
<?php
if(isset($_REQUEST['test']))
        echo 
'<p>'.bin2hex($_REQUEST['test']).'</p>';
?>

</body></html>

1NXYoJ5xU91Jp83XfVMHwwTUyZFK64BoAD
gweedo
Legendary
*
Offline Offline

Activity: 1498
Merit: 1000


View Profile
March 21, 2013, 05:48:36 AM
 #2

I am using php 5.3 on apache on mac OSX 10.8 (dev server), and just tried that snippet it gives me the same thing... so this is probably a php 5+ problem.
gweedo
Legendary
*
Offline Offline

Activity: 1498
Merit: 1000


View Profile
March 21, 2013, 05:51:09 AM
 #3

I just tried

Code:
<!DOCTYPE HTML>
<html lang="en">
<title>asdf</title>
</head>
<body>
<form action="" method="post">
<input name="test" type="text" value="&nbsp;" />
<input type="submit" />
</form>
<?php
if(isset($_REQUEST['test']))
        echo 
'<p>'.bin2hex($_REQUEST['test']).'</p>';
?>

</body></html>

which is HTML 5 and I got c2a0
theymos (OP)
Administrator
Legendary
*
Offline Offline

Activity: 5376
Merit: 13410


View Profile
March 21, 2013, 06:05:01 AM
 #4

I looked at the XHTML/HTML standards, and &nbsp; actually is defined as being 0xA0. But I'm pretty sure that my browser didn't used to behave this way...

1NXYoJ5xU91Jp83XfVMHwwTUyZFK64BoAD
theymos (OP)
Administrator
Legendary
*
Offline Offline

Activity: 5376
Merit: 13410


View Profile
March 21, 2013, 06:18:49 AM
 #5

This causes problems because SMF converts multiple spaces into a series of &nbsp; entities. So "  " becomes "&nbsp; ". When the entities are converted to 0xA0, it causes at least these problems:
- When quoting a PM which has multiple spaces, you will end up submitting a message containing 0xA0 characters. This confuses the forum's mail processing and the recipient ends up being sent a PM notification email with an empty message. If you receive a lot of PMs you've probably noticed this.
- There is no way for the forum to correctly display a clearsigned document if it has multiple spaces, even with [code] tags. Copying it will copy the non-breaking spaces (though only on some systems, I think) and it won't sign/verify consistently.
- I can't use the forum's file browser because submitting a file with any 0xA0s messes things up for some reason.

1NXYoJ5xU91Jp83XfVMHwwTUyZFK64BoAD
chmod755
Legendary
*
Offline Offline

Activity: 1582
Merit: 1021



View Profile WWW
March 21, 2013, 07:07:26 AM
Last edit: March 21, 2013, 08:45:44 AM by chmod755
 #6

Did you try the same

  • in a different browser?
  • with a different charset?
(utf-8 or something)

🏰 TradeFortress 🏰
Bitcoin Veteran
VIP
Legendary
*
Offline Offline

Activity: 1316
Merit: 1043

👻


View Profile
March 21, 2013, 08:18:02 AM
 #7

    Did you try the same

    • in a different browser?
    • with a different charset?
    (utf-8 or something)
    [/list]

    It's intended behavior (nbsp -> 0xA0), but SMF doesn't like it.

    What about removing SMF's multiple space to &nbsp conversion?
    davout
    Legendary
    *
    Offline Offline

    Activity: 1372
    Merit: 1008


    1davout


    View Profile WWW
    March 21, 2013, 08:22:36 AM
     #8

    nbsp -> non-breakable space oO
    That's like... by design

    Bitsky
    Hero Member
    *****
    Offline Offline

    Activity: 576
    Merit: 514


    View Profile
    March 21, 2013, 06:36:55 PM
     #9

    It's send like that from the browser. Just run a tcpdump and you'll see.

    Firefox/Opera sends the POST request string "test=%C2%A0", while curl sends "test=&nbsp;".

    http://en.wikipedia.org/wiki/Non-breaking_space#Encodings

    Bounty: Earn up to 68.7 BTC
    Like my post? Feel free to drop a tip to 1BitskyZbfR4irjyXDaGAM2wYKQknwX36Y
    Foxpup
    Legendary
    *
    Offline Offline

    Activity: 4533
    Merit: 3184


    Vile Vixen and Miss Bitcointalk 2021-2023


    View Profile
    March 22, 2013, 10:17:09 AM
     #10

    This is exactly what's supposed to happen. The default value is a non-breaking space, so the browser sends a non-breaking space. What did you expect? No browser will send a regular space in this situation. Perhaps something in the old software was silently converting non-breaking spaces to regular spaces? Otherwise it should never have worked in the first place if non-breaking spaces are such a problem.

    Will pretend to do unspeakable things (while actually eating a taco) for bitcoins: 1K6d1EviQKX3SVKjPYmJGyWBb1avbmCFM4
    I am not on the scammers' paradise known as Telegram! Do not believe anyone claiming to be me off-forum without a signed message from the above address! Accept no excuses and make no exceptions!
    theymos (OP)
    Administrator
    Legendary
    *
    Offline Offline

    Activity: 5376
    Merit: 13410


    View Profile
    March 23, 2013, 04:01:02 AM
     #11

    Ha! I figured out what causes most of these problems. (Though I guess &nbsp; has been the special non-breaking space character in all browsers for at least several years.)

    If omitted, the default value for this argument is ISO-8859-1 in versions of PHP prior to 5.4.0, and UTF-8 from PHP 5.4.0 onwards.

    With PHP < 5.4.0, 0xA0 was passed through normally. Now, this character is considered invalid UTF-8 and the entire input to htmlspecialchars is scrapped.

    This needs to be fixed in SMF. It affects even 2.x. I fixed it for the PM emails, but it'd be too much work to fix it everywhere.

    I'm getting pretty sick of dealing with SMF's escaping insanity... Who thought it was a good idea to have every function take strings with different degrees of escaping?

    1NXYoJ5xU91Jp83XfVMHwwTUyZFK64BoAD
    Pages: [1]
      Print  
     
    Jump to:  

    Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!