Summary: | Make the canonical names (of TextEncoding) robust to changes in ICU's alias table | ||
---|---|---|---|
Product: | WebKit | Reporter: | Jungshik Shin <jshin> |
Component: | Platform | Assignee: | Nobody <webkit-unassigned> |
Status: | NEW --- | ||
Severity: | Normal | CC: | ap, darin, eric |
Priority: | P2 | ||
Version: | 528+ (Nightly build) | ||
Hardware: | All | ||
OS: | All |
Description
Jungshik Shin
2009-08-26 17:10:41 PDT
in theory I like the idea of TextEncoding always using the "canonical" capitalization of charset names. If such a thing existed. Lacking that, lowercasing all the names and changing the tests sounds OK to me. As long as it doesn't affect performance. Won't this change what JavaScript code sees as document.charset? If so, there's certain potential for negative web site compatibility effects - which is difficult to justify by ease of writing regression tests in my opinion. (In reply to comment #2) > Won't this change what JavaScript code sees as document.charset? If so, there's > certain potential for negative web site compatibility effects - which is > difficult to justify by ease of writing regression tests in my opinion. In theory, all those JS codes and server-side codes behind them should do the case-insensitive matching for charset names, but in practice, you're right that there's a risk. I'll see what other browsers emit for document.charset (capital or lowercase). It seems that it's only GB18030 vs gb18030 that has changed in ICU 4.2. An alternative to lowercasing all charset names is to special-case GB18030 vs gb18030 in TextCodecICU.cpp probably enclosed with #ifdef (to make it ICU version-specific). See also: bug 125225. |