According to the DTDs for HTML 4 (http://www.w3.org/TR/html4/HTMLsymbol.ent) and XHTML 1 (http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent) the lang and rang entities should correspond to U+3008 and U+3009 respectively. It states:
<!-- lang is NOT the same character as U+003C 'less than' or U+2039 'single left-pointing angle quotation mark' -->
Currently lang and rang incorrectly end up as U+2039 and U+203A.
Actually, the correspondence is a bit different: lang should be U+2329 according to DTDs, but that character is deprecated in Unicode. Its canonical form is U+3008.
<!ENTITY lang "〈"> <!-- left-pointing angle bracket = bra,
U+2329 ISOtech -->
According to <http://www.w3.org/TR/charmod-norm/>, text on the Web should be in canonical precomposed form. We currently do this canonicalization for XHTML, but not for HTML.
I stuffed up the initial description of this bug completely.
WebKit's current behaviour is to map "lang" to U+2329 for HTML but to U+3008 in XHTML. It maps "rang" to U+232A for HTML but to U+3009 for XHTML.
The behaviour as defined in the HTML and XHTML DTDs is to map "lang" to U+2329 and "rang" to U+232A. Our behaviour is therefore technically incorrect for XHTML, but as Alexey mentions U+2329 and U+232A are deprecated. This means that U+3008/U+3009 are arguably "more right". . . or something.
Created attachment 11368 [details]
OK, I'm not really convinced myself, but we should either go this way or make XHTML work as HTML for these entities...
Comment on attachment 11368 [details]
Does this affect any other test results?
Committed revision 17591. No other tests were affected.
Fixed this in the HTML5 spec too.