TextEncodingRegistry ignores non-alphanumeric characters when comparing encoding names. This is not what other browsers do, and this causes compatibility problems when incorrectly specified encoding is expected to be ignored and inherited from parent frame. Some examples of what WebKit parses, but other browsers don't: "8859_1", // <rdar://problem/7859068> "ISO8859_1", // <rdar://problem/7863399> "8859-1", // WebKit used to specifically add this alias name, but other browsers don't support it. "ISO_2022,locale=ja,version=0", // We never want versioned alias names, other browsers don't support these. "utf 8", // Other weird variations "utf_8", "8859 1", "8859*1", "8859:1", "88591", "ISO_88591", "ISO-88591", "ISO-88-59-1", "latin-1" // Yes, it's "latin1" without a dash - neither IE nor Firefox support "latin-1".
Created attachment 63585 [details] proposed patch
Attachment 63585 [details] did not pass style-queue: Failed to run "['WebKitTools/Scripts/check-webkit-style']" exit_code: 1 WebCore/platform/text/TextEncodingRegistry.cpp:162: Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons. [readability/comparison_to_zero] [5] Total errors found: 1 in 11 files If any of these errors are false positives, please file a bug against check-webkit-style.
Is this going to cause problem with WebKit-specific content that relies on this behavior that we've had for the past 7 or so years?
Comment on attachment 63585 [details] proposed patch > +// If passed any non-ASCII characters, depends on the behavior of isalnum -- > +// if that returns false as it does on OS X, then it will properly skip those > +// characters too. This comment is not true, and should be removed. > struct TextEncodingNameHash { With the changes you have made, is TextEncodingNameHash different from CaseFoldingHash? Can we just use CaseFoldingHash instead? > + if (0 == strcmp(alias, "8859_1")) > + return true; Would be good to add a comment explaining why we don't want this alias. Generally speaking this seems like a good patch, but I am slightly worried about backward compatibility.
Comment on attachment 63585 [details] proposed patch If you haven't landed this, there is a typo in the ChangeLog: > Index: WebCore/ChangeLog > =================================================================== > + * platform/text/TextEncoding.cpp: (WebCore::Latin1Encoding): > + "Latin-1" is not a real encoding name, it's not known ot Firefox or IE. TYPO: "not known ot Firefox" => "not known to Firefox"
> With the changes you have made, is TextEncodingNameHash different from CaseFoldingHash? > Can we just use CaseFoldingHash instead? I'm not sure - the code is different, as it's ASCII-only, but maybe the difference in behavior/performance is negligible. > Is this going to cause problem with WebKit-specific content that relies on this behavior > that we've had for the past 7 or so years? That seems likely, yes. But I think that the extent of forgiveness has changed semi-recently, with the introduction of TextEncodingRegistry. Maybe my memory is wrong.
Committed <http://trac.webkit.org/changeset/64817>.
http://trac.webkit.org/changeset/64817 might have broken GTK Linux 64-bit Debug
Follow up fix in <http://trac.webkit.org/changeset/64820> - added "shift-jis".
More follow-up fixing in r64831 - don't register ISO8859-16, which was causing crashes on platforms with older ICU, and which isn't needed (at least, not yet).