RESOLVED FIXED 43554
Way too many encoding aliases are treated as valid
https://bugs.webkit.org/show_bug.cgi?id=43554
Summary Way too many encoding aliases are treated as valid
Alexey Proskuryakov
Reported 2010-08-05 05:17:25 PDT
TextEncodingRegistry ignores non-alphanumeric characters when comparing encoding names. This is not what other browsers do, and this causes compatibility problems when incorrectly specified encoding is expected to be ignored and inherited from parent frame. Some examples of what WebKit parses, but other browsers don't: "8859_1", // <rdar://problem/7859068> "ISO8859_1", // <rdar://problem/7863399> "8859-1", // WebKit used to specifically add this alias name, but other browsers don't support it. "ISO_2022,locale=ja,version=0", // We never want versioned alias names, other browsers don't support these. "utf 8", // Other weird variations "utf_8", "8859 1", "8859*1", "8859:1", "88591", "ISO_88591", "ISO-88591", "ISO-88-59-1", "latin-1" // Yes, it's "latin1" without a dash - neither IE nor Firefox support "latin-1".
Attachments
proposed patch (17.65 KB, patch)
2010-08-05 06:59 PDT, Alexey Proskuryakov
darin: review+
Alexey Proskuryakov
Comment 1 2010-08-05 06:59:56 PDT
Created attachment 63585 [details] proposed patch
WebKit Review Bot
Comment 2 2010-08-05 07:01:54 PDT
Attachment 63585 [details] did not pass style-queue: Failed to run "['WebKitTools/Scripts/check-webkit-style']" exit_code: 1 WebCore/platform/text/TextEncodingRegistry.cpp:162: Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons. [readability/comparison_to_zero] [5] Total errors found: 1 in 11 files If any of these errors are false positives, please file a bug against check-webkit-style.
Darin Adler
Comment 3 2010-08-05 07:51:20 PDT
Is this going to cause problem with WebKit-specific content that relies on this behavior that we've had for the past 7 or so years?
Darin Adler
Comment 4 2010-08-05 07:55:26 PDT
Comment on attachment 63585 [details] proposed patch > +// If passed any non-ASCII characters, depends on the behavior of isalnum -- > +// if that returns false as it does on OS X, then it will properly skip those > +// characters too. This comment is not true, and should be removed. > struct TextEncodingNameHash { With the changes you have made, is TextEncodingNameHash different from CaseFoldingHash? Can we just use CaseFoldingHash instead? > + if (0 == strcmp(alias, "8859_1")) > + return true; Would be good to add a comment explaining why we don't want this alias. Generally speaking this seems like a good patch, but I am slightly worried about backward compatibility.
Joseph Pecoraro
Comment 5 2010-08-05 08:55:36 PDT
Comment on attachment 63585 [details] proposed patch If you haven't landed this, there is a typo in the ChangeLog: > Index: WebCore/ChangeLog > =================================================================== > + * platform/text/TextEncoding.cpp: (WebCore::Latin1Encoding): > + "Latin-1" is not a real encoding name, it's not known ot Firefox or IE. TYPO: "not known ot Firefox" => "not known to Firefox"
Alexey Proskuryakov
Comment 6 2010-08-05 11:42:50 PDT
> With the changes you have made, is TextEncodingNameHash different from CaseFoldingHash? > Can we just use CaseFoldingHash instead? I'm not sure - the code is different, as it's ASCII-only, but maybe the difference in behavior/performance is negligible. > Is this going to cause problem with WebKit-specific content that relies on this behavior > that we've had for the past 7 or so years? That seems likely, yes. But I think that the extent of forgiveness has changed semi-recently, with the introduction of TextEncodingRegistry. Maybe my memory is wrong.
Alexey Proskuryakov
Comment 7 2010-08-05 22:37:13 PDT
WebKit Review Bot
Comment 8 2010-08-05 22:48:43 PDT
http://trac.webkit.org/changeset/64817 might have broken GTK Linux 64-bit Debug
Alexey Proskuryakov
Comment 9 2010-08-05 23:53:01 PDT
Follow up fix in <http://trac.webkit.org/changeset/64820> - added "shift-jis".
Alexey Proskuryakov
Comment 10 2010-08-06 04:26:09 PDT
More follow-up fixing in r64831 - don't register ISO8859-16, which was causing crashes on platforms with older ICU, and which isn't needed (at least, not yet).
Note You need to log in before you can comment on or make changes to this bug.