WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
NEW
Bug 28760
Make the canonical names (of TextEncoding) robust to changes in ICU's alias table
https://bugs.webkit.org/show_bug.cgi?id=28760
Summary
Make the canonical names (of TextEncoding) robust to changes in ICU's alias t...
Jungshik Shin
Reported
2009-08-26 17:10:41 PDT
In ICU 4.2, the gb18030 entry in convertrs.txt changed to gb18030 { IANA* } ibm-1392 { IBM* } windows-54936 { WINDOWS* } GB18030 { MIME* } from gb18030 { IANA* } ibm-1392 { IBM* } windows-54936 { WINDOWS* } Note that 'GB18030' (uppercase) was added as the MIME name for gb18030. Because Webkit gives a higher precedence to MIME, it picks up GB18030 as the canonical name. Chromium has some tests that do the case-sensitive comparison of charset names (Webkit layout tests have some, too. e.g. 'EUC-JP'). Chromium also has some unit tests (dom serialization) and ui tests (encoding menu test) that compare the textual contents of two files which include 'meta charset' label generated by dom serializer which uses the canonical name of TextEncoding for meta charset generation. It's possible to track down all the cases where TextEncoding::name() is used and lowercases the return value in Webkit 'clients', but it may be better to make the canonical name of TextEncoding be always lowercase. When we do, we have to change the expected results of a few layout tests.
Attachments
Add attachment
proposed patch, testcase, etc.
Darin Adler
Comment 1
2009-08-26 18:30:30 PDT
in theory I like the idea of TextEncoding always using the "canonical" capitalization of charset names. If such a thing existed. Lacking that, lowercasing all the names and changing the tests sounds OK to me. As long as it doesn't affect performance.
Alexey Proskuryakov
Comment 2
2009-08-26 18:42:18 PDT
Won't this change what JavaScript code sees as document.charset? If so, there's certain potential for negative web site compatibility effects - which is difficult to justify by ease of writing regression tests in my opinion.
Jungshik Shin
Comment 3
2009-09-02 12:47:35 PDT
(In reply to
comment #2
)
> Won't this change what JavaScript code sees as document.charset? If so, there's > certain potential for negative web site compatibility effects - which is > difficult to justify by ease of writing regression tests in my opinion.
In theory, all those JS codes and server-side codes behind them should do the case-insensitive matching for charset names, but in practice, you're right that there's a risk. I'll see what other browsers emit for document.charset (capital or lowercase). It seems that it's only GB18030 vs gb18030 that has changed in ICU 4.2. An alternative to lowercasing all charset names is to special-case GB18030 vs gb18030 in TextCodecICU.cpp probably enclosed with #ifdef (to make it ICU version-specific).
Alexey Proskuryakov
Comment 4
2014-01-01 23:20:50 PST
See also:
bug 125225
.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug