Summary: | ⟨ and ⟩ entities are mapped to the incorrect Unicode codepoint | ||||||
---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Mark Rowe (bdash) <mrowe> | ||||
Component: | WebCore Misc. | Assignee: | Nobody <webkit-unassigned> | ||||
Status: | RESOLVED FIXED | ||||||
Severity: | Normal | CC: | ap | ||||
Priority: | P2 | ||||||
Version: | 420+ | ||||||
Hardware: | Mac | ||||||
OS: | OS X 10.4 | ||||||
Attachments: |
|
Description
Mark Rowe (bdash)
2006-10-29 03:56:49 PST
Actually, the correspondence is a bit different: lang should be U+2329 according to DTDs, but that character is deprecated in Unicode. Its canonical form is U+3008. <!ENTITY lang "〈"> <!-- left-pointing angle bracket = bra, U+2329 ISOtech --> According to <http://www.w3.org/TR/charmod-norm/>, text on the Web should be in canonical precomposed form. We currently do this canonicalization for XHTML, but not for HTML. I stuffed up the initial description of this bug completely. WebKit's current behaviour is to map "lang" to U+2329 for HTML but to U+3008 in XHTML. It maps "rang" to U+232A for HTML but to U+3009 for XHTML. The behaviour as defined in the HTML and XHTML DTDs is to map "lang" to U+2329 and "rang" to U+232A. Our behaviour is therefore technically incorrect for XHTML, but as Alexey mentions U+2329 and U+232A are deprecated. This means that U+3008/U+3009 are arguably "more right". . . or something. Created attachment 11368 [details]
proposed patch
OK, I'm not really convinced myself, but we should either go this way or make XHTML work as HTML for these entities...
Comment on attachment 11368 [details]
proposed patch
I'm convinced.
r=me
Does this affect any other test results?
Committed revision 17591. No other tests were affected. Fixed this in the HTML5 spec too. |