Bitcoin Forum
December 14, 2017, 10:03:30 PM *
News: Latest stable version of Bitcoin Core: 0.15.1  [Torrent].
 
   Home   Help Search Donate Login Register  
Pages: [1]
  Print  
Author Topic: Why is HTML   being converted to the 0xA0 character instead of a space?  (Read 3898 times)
theymos
Administrator
Legendary
*
Offline Offline

Activity: 2870


View Profile
March 21, 2013, 05:40:16 AM
 #1

Look at this page:
https://bitcointalk.org/test.php

The form is pre-filled with a  . If I submit it, I get "a0", indicating that the browser sent the special 0xA0 "non-breaking space" character instead of a regular space. This isn't normal, and it's causing several problems for the forum software. This behavior started when I upgraded php and switched to nginx before switching servers, and it's persisted after switching servers. So it's probably some problem with the configuration of nginx or php.

Any ideas on how to fix this?

The code for test.php:
Code:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"><head> <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />
<title>asdf</title>
</head>
<body>
<form action="" method="post">
<input name="test" type="text" value="&nbsp;" />
<input type="submit" />
</form>
<?php
if(isset($_REQUEST['test']))
        echo 
'<p>'.bin2hex($_REQUEST['test']).'</p>';
?>

</body></html>

1NXYoJ5xU91Jp83XfVMHwwTUyZFK64BoAD
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.
gweedo
Legendary
*
Offline Offline

Activity: 1246


Java, PHP, HTML/CSS Programmer for Hire!


View Profile WWW
March 21, 2013, 05:48:36 AM
 #2

I am using php 5.3 on apache on mac OSX 10.8 (dev server), and just tried that snippet it gives me the same thing... so this is probably a php 5+ problem.

Want to earn 2500 SATOSHIS per hour? Come Chat and Chill in https://goseemybits.com/lobby
gweedo
Legendary
*
Offline Offline

Activity: 1246


Java, PHP, HTML/CSS Programmer for Hire!


View Profile WWW
March 21, 2013, 05:51:09 AM
 #3

I just tried

Code:
<!DOCTYPE HTML>
<html lang="en">
<title>asdf</title>
</head>
<body>
<form action="" method="post">
<input name="test" type="text" value="&nbsp;" />
<input type="submit" />
</form>
<?php
if(isset($_REQUEST['test']))
        echo 
'<p>'.bin2hex($_REQUEST['test']).'</p>';
?>

</body></html>

which is HTML 5 and I got c2a0

Want to earn 2500 SATOSHIS per hour? Come Chat and Chill in https://goseemybits.com/lobby
theymos
Administrator
Legendary
*
Offline Offline

Activity: 2870


View Profile
March 21, 2013, 06:05:01 AM
 #4

I looked at the XHTML/HTML standards, and &nbsp; actually is defined as being 0xA0. But I'm pretty sure that my browser didn't used to behave this way...

1NXYoJ5xU91Jp83XfVMHwwTUyZFK64BoAD
theymos
Administrator
Legendary
*
Offline Offline

Activity: 2870


View Profile
March 21, 2013, 06:18:49 AM
 #5

This causes problems because SMF converts multiple spaces into a series of &nbsp; entities. So "  " becomes "&nbsp; ". When the entities are converted to 0xA0, it causes at least these problems:
- When quoting a PM which has multiple spaces, you will end up submitting a message containing 0xA0 characters. This confuses the forum's mail processing and the recipient ends up being sent a PM notification email with an empty message. If you receive a lot of PMs you've probably noticed this.
- There is no way for the forum to correctly display a clearsigned document if it has multiple spaces, even with [code] tags. Copying it will copy the non-breaking spaces (though only on some systems, I think) and it won't sign/verify consistently.
- I can't use the forum's file browser because submitting a file with any 0xA0s messes things up for some reason.

1NXYoJ5xU91Jp83XfVMHwwTUyZFK64BoAD
chmod755
Legendary
*
Offline Offline

Activity: 1148


View Profile WWW
March 21, 2013, 07:07:26 AM
 #6

Did you try the same

  • in a different browser?
  • with a different charset?
(utf-8 or something)
TradeFortress
VIP
Legendary
*
Offline Offline

Activity: 910


View Profile
March 21, 2013, 08:18:02 AM
 #7

    Did you try the same

    • in a different browser?
    • with a different charset?
    (utf-8 or something)
    [/list]

    It's intended behavior (nbsp -> 0xA0), but SMF doesn't like it.

    What about removing SMF's multiple space to &nbsp conversion?
    davout
    Legendary
    *
    Offline Offline

    Activity: 1372


    1davout


    View Profile WWW
    March 21, 2013, 08:22:36 AM
     #8

    nbsp -> non-breakable space oO
    That's like... by design

    Bitsky
    Hero Member
    *****
    Offline Offline

    Activity: 559


    View Profile
    March 21, 2013, 06:36:55 PM
     #9

    It's send like that from the browser. Just run a tcpdump and you'll see.

    Firefox/Opera sends the POST request string "test=%C2%A0", while curl sends "test=&nbsp;".

    http://en.wikipedia.org/wiki/Non-breaking_space#Encodings

    Bounty: Earn up to 68.7 BTC
    Like my post? Feel free to drop a tip to 1BitskyZbfR4irjyXDaGAM2wYKQknwX36Y
    Foxpup
    Legendary
    *
    Offline Offline

    Activity: 2044



    View Profile
    March 22, 2013, 10:17:09 AM
     #10

    This is exactly what's supposed to happen. The default value is a non-breaking space, so the browser sends a non-breaking space. What did you expect? No browser will send a regular space in this situation. Perhaps something in the old software was silently converting non-breaking spaces to regular spaces? Otherwise it should never have worked in the first place if non-breaking spaces are such a problem.

    Will pretend to do unverifiable things (while actually eating an enchilada-style burrito) for bitcoins: 1K6d1EviQKX3SVKjPYmJGyWBb1avbmCFM4
    theymos
    Administrator
    Legendary
    *
    Offline Offline

    Activity: 2870


    View Profile
    March 23, 2013, 04:01:02 AM
     #11

    Ha! I figured out what causes most of these problems. (Though I guess &nbsp; has been the special non-breaking space character in all browsers for at least several years.)

    If omitted, the default value for this argument is ISO-8859-1 in versions of PHP prior to 5.4.0, and UTF-8 from PHP 5.4.0 onwards.

    With PHP < 5.4.0, 0xA0 was passed through normally. Now, this character is considered invalid UTF-8 and the entire input to htmlspecialchars is scrapped.

    This needs to be fixed in SMF. It affects even 2.x. I fixed it for the PM emails, but it'd be too much work to fix it everywhere.

    I'm getting pretty sick of dealing with SMF's escaping insanity... Who thought it was a good idea to have every function take strings with different degrees of escaping?

    1NXYoJ5xU91Jp83XfVMHwwTUyZFK64BoAD
    Pages: [1]
      Print  
     
    Jump to:  

    Sponsored by , a Bitcoin-accepting VPN.
    Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!