Bug 95548 - Some whitespace characters and joiners are not rendered as entities in MarkupAccumulator
Summary: Some whitespace characters and joiners are not rendered as entities in Markup...
Status: RESOLVED INVALID
Alias: None
Product: WebKit
Classification: Unclassified
Component: HTML Editing (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Alexander Pavlov (apavlov)
URL:
Keywords:
Depends on:
Blocks: 93888
  Show dependency treegraph
 
Reported: 2012-08-31 03:32 PDT by Alexander Pavlov (apavlov)
Modified: 2012-09-01 01:28 PDT (History)
6 users (show)

See Also:


Attachments
Patch (12.29 KB, patch)
2012-08-31 05:47 PDT, Alexander Pavlov (apavlov)
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alexander Pavlov (apavlov) 2012-08-31 03:32:08 PDT
http://getfirebug.com/developer/api/firebug1.6X/symbols/src/content_firebug_lib.js.html, lines 2186-2194 list a few "invisible" characters that have corresponding HTML entities. Some of these entities, however, are never rendered in the markup generated for text nodes. This is the list of Unicode characters along with entities that should be generated for them:

0x2002 - 'ensp'
0x2003 - 'emsp'
0x2009 - 'thinsp'
0x200c - 'zwnj'
0x200d - 'zwj'
0x200e - 'lrm'
0x200f - 'rlm'
0x200b - '#8203'
Comment 1 Alexander Pavlov (apavlov) 2012-08-31 05:47:31 PDT
Created attachment 161676 [details]
Patch
Comment 2 Alexey Proskuryakov 2012-08-31 10:18:04 PDT
Please elaborate on why these characters should be encoded, and whether that matches other browsers.

Also, please test what happens with XML - it's not OK to use these named entities in raw XML documents.
Comment 3 Darin Adler 2012-08-31 12:05:10 PDT
Comment on attachment 161676 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=161676&action=review

> Source/WebCore/ChangeLog:16
> +        Add the following entities to be rendered in the generated markup:
> +        -  
> +        -  
> +        -  
> +        - ‌
> +        - ‍
> +        - ‎
> +        - ‏
> +        - ​ (zero-width space, '\u200b')

Why? Does the specification say that these need to be entities and not just characters?
Comment 4 Alexander Pavlov (apavlov) 2012-08-31 22:57:28 PDT
(In reply to comment #3)
> (From update of attachment 161676 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=161676&action=review

[...]

> Why? Does the specification say that these need to be entities and not just characters?

Aha, I finally found the spec and take my words back! Sorry for the noise.
Comment 5 Benjamin Poulain 2012-08-31 23:43:09 PDT
Comment on attachment 161676 [details]
Patch

> > Why? Does the specification say that these need to be entities and not just characters?
> 
> Aha, I finally found the spec and take my words back! Sorry for the noise.

I assume you also meant to clear the review flag?