Bug 11448 - ⟨ and ⟩ entities are mapped to the incorrect Unicode codepoint
Summary: ⟨ and ⟩ entities are mapped to the incorrect Unicode codepoint
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 420+
Hardware: Mac OS X 10.4
: P2 Normal
Assignee: Nobody
Depends on:
Reported: 2006-10-29 03:56 PST by Mark Rowe (bdash)
Modified: 2007-06-14 16:22 PDT (History)
1 user (show)

See Also:

proposed patch (383.66 KB, patch)
2006-11-03 13:57 PST, Alexey Proskuryakov
mjs: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Rowe (bdash) 2006-10-29 03:56:49 PST
According to the DTDs for HTML 4 (http://www.w3.org/TR/html4/HTMLsymbol.ent) and XHTML 1 (http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent) the lang and rang entities should correspond to U+3008 and U+3009 respectively.  It states:

<!-- lang is NOT the same character as U+003C 'less than' or U+2039 'single left-pointing angle quotation mark' -->

Currently lang and rang incorrectly end up as U+2039 and U+203A.
Comment 1 Alexey Proskuryakov 2006-10-29 04:07:33 PST
Actually, the correspondence is a bit different: lang should be U+2329 according to DTDs, but that character is deprecated in Unicode. Its canonical form is U+3008.

<!ENTITY lang     "&#9001;"> <!-- left-pointing angle bracket = bra,
                                     U+2329 ISOtech -->

According to <http://www.w3.org/TR/charmod-norm/>, text on the Web should be in canonical precomposed form. We currently do this canonicalization for XHTML, but not for HTML.
Comment 2 Mark Rowe (bdash) 2006-10-29 04:14:37 PST
I stuffed up the initial description of this bug completely.

WebKit's current behaviour is to map "lang" to U+2329 for HTML but to U+3008 in XHTML.  It maps "rang" to U+232A for HTML but to U+3009 for XHTML.

The behaviour as defined in the HTML and XHTML DTDs is to map "lang" to U+2329 and "rang" to U+232A.  Our behaviour is therefore technically incorrect for XHTML, but as Alexey mentions U+2329 and U+232A are deprecated.  This means that U+3008/U+3009 are arguably "more right". . .  or something.
Comment 3 Alexey Proskuryakov 2006-11-03 13:57:36 PST
Created attachment 11368 [details]
proposed patch

OK, I'm not really convinced myself, but we should either go this way or make XHTML work as HTML for these entities...
Comment 4 Maciej Stachowiak 2006-11-03 16:04:43 PST
Comment on attachment 11368 [details]
proposed patch

I'm convinced.


Does this affect any other test results?
Comment 5 Alexey Proskuryakov 2006-11-04 00:03:21 PST
Committed revision 17591. No other tests were affected.
Comment 6 Ian 'Hixie' Hickson 2007-06-14 16:22:57 PDT
Fixed this in the HTML5 spec too.