Bug 14862

Summary: contentEditable inserts non standard spaces
Product: WebKit Reporter: Peer Bremer <peer>
Component: HTML EditingAssignee: Nobody <webkit-unassigned>
Status: RESOLVED INVALID    
Severity: Normal CC: ap, justin.garcia, mrowe, peer
Priority: P2    
Version: 523.x (Safari 3)   
Hardware: Mac   
OS: OS X 10.4   
URL: http://www.smileCMS.com/webkit/strange_spaces.txt
Attachments:
Description Flags
just an editable div none

Description Peer Bremer 2007-08-02 04:50:10 PDT
When editing text inside contenteditable div's the browser now inserts non standard spaces that then do not wrap around in tables set to a certain width forcing the page to be wider than the current window, this is especially happening in Safari beta 3.03 and has not happened in Safari beta 3.02 the lates webkit build does render the pages correctly but also inserts these non standart spaces into the text. When viewed with BBEdit show invisibles on they are represented by a solid dot instead the diamond that stands for spaces the zap gremlins function does remove these hidden characters.
Comment 1 Alexey Proskuryakov 2007-08-02 13:08:30 PDT
These are non-breaking spaces, and they are often necessary when editing HTML.

Since some code is broken by this, it's quite possible that this is not one of these cases. Could you please provide detailed steps to reproduce this issue on a live site (or alternatively an interactive test case)? It's unlikely that we'll be able to proceed with just the information in bug description.
Comment 2 Peer Bremer 2007-08-02 13:33:36 PDT
(In reply to comment #1)
> These are non-breaking spaces, and they are often necessary when editing HTML.

That is not the issue if  a correct "&nbsp;" is inserted that would be perfectly acceptable, the browser however inserts non ascci  characters which are not standard html.
Comment 3 Alexey Proskuryakov 2007-08-02 21:45:19 PDT
Created attachment 15814 [details]
just an editable div

This is what Safari was always doing, so the difference between 3.0.2 and 3.0.3 betas must be in something else. You can use the attached test to verify what is inserted in an editable div.
Comment 4 Mark Rowe (bdash) 2007-08-03 13:48:34 PDT
The character that is inserted is a non-breaking space, just as Alexey mentioned.  In the case of the URL you provided, the character on the second line between the quotation marks is a UTF-8 encoded non-breaking space.  If you change Safari's encoding via View -> Text Encodings, you will notice that it renders as expected.  It appears as an unexpected A-like character as it is being interpreted according to the default encoding of the browser, which is used when the server does not specify which encoding *should* be used.

In Alexey's example entering "A  B" (that is, A followed by two spaces then a B) is displayed in the alert as "A  B (A%20%A0B)". Plugging this into a short code snippet to display the names of the characters that compose that string reveals:
>>> string = "A\x20\xA0B"
>>> map(unicodedata.name, string.decode('latin1'))
['LATIN CAPITAL LETTER A', 'SPACE', 'NO-BREAK SPACE', 'LATIN CAPITAL LETTER B']

I think this bug should probably be closed as everything appears to be working as expected.
Comment 5 Peer Bremer 2007-08-03 14:47:01 PDT
Thank you for the explanation, I might be a bit dumb, but I do not think it is a good idea that the browser is using non standard characters when generating html in content editable div. For the sake of compatibility shouldn't the browser produce clean standard html code, there is already a lot of Apple Style tags and color specified as rgb values and other silly html like wrapping breaks <br> inside of DIV tags etc. But at least these are visible text chars and html.
Anyhow I accept that this is not a bug, also these invisible chars mess up the page structure causing tables with width specifications to be wider than specified since the lines do not break.
Comment 6 Peer Bremer 2007-08-03 14:52:41 PDT
Have looked at the test again and you are right about what happens if you use two spaces,
but I think it would be much better to use &nbsp; in these cases and not some invisible non html character.
Comment 7 Mark Rowe (bdash) 2007-08-03 15:05:47 PDT
The invisible character *is* a &nbsp;.  I'm not sure why you keep saying that it is non-standard.  Take a look at <http://www.w3.org/TR/REC-html40/sgml/entities.html>:

<!ENTITY nbsp   CDATA "&#160;" -- no-break space = non-breaking space,
                                  U+00A0 ISOnum -->

Notice U+00A0?  That's the character code you see in Alexey's demo.  It is the *same* character.

If you are seeing it rendered visibly as anything but a non-breaking space (eg, the A-like character) then you have a character encoding issue, most likely in your web server or application configuration.

> these invisible chars mess up the page structure causing tables with
> width specifications to be wider than specified since the lines do not break

If you have a specific case where this is occurring, please provide an example.  WebKit typically inserts alternating pairs of space and non-breaking space when editing which is sufficient to be visually consistent with what the user has typed while also allowing wrapping to occur.
Comment 8 Alexey Proskuryakov 2007-12-28 11:15:34 PST
This report doesn't give steps to reproduce a bug or demonstrate an incompatibility with a web site; closing.

Note that we track a request to serialize non-breaking spaces as &nbsp; in bug 11947. I'm not making this a duplicate, because this report also mentions some difference between 3.0.2 and 3.0.3 betas that we couldn't pin down.