43554 – Way too many encoding aliases are treated as valid

RESOLVED FIXED 43554

Way too many encoding aliases are treated as valid

https://bugs.webkit.org/show_bug.cgi?id=43554

Summary Way too many encoding aliases are treated as valid

Alexey Proskuryakov

Reported 2010-08-05 05:17:25 PDT

TextEncodingRegistry ignores non-alphanumeric characters when comparing encoding names. This is not what other browsers do, and this causes compatibility problems when incorrectly specified encoding is expected to be ignored and inherited from parent frame. Some examples of what WebKit parses, but other browsers don't: "8859_1", // <rdar://problem/7859068> "ISO8859_1", // <rdar://problem/7863399> "8859-1", // WebKit used to specifically add this alias name, but other browsers don't support it. "ISO_2022,locale=ja,version=0", // We never want versioned alias names, other browsers don't support these. "utf 8", // Other weird variations "utf_8", "8859 1", "8859*1", "8859:1", "88591", "ISO_88591", "ISO-88591", "ISO-88-59-1", "latin-1" // Yes, it's "latin1" without a dash - neither IE nor Firefox support "latin-1".

Attachments
proposed patch (17.65 KB, patch) 2010-08-05 06:59 PDT, Alexey Proskuryakov	darin: review+	Details Formatted Diff Diff
View All Add attachment proposed patch, testcase, etc.

Alexey Proskuryakov

Comment 1 2010-08-05 06:59:56 PDT

Created attachment 63585 [details] proposed patch

WebKit Review Bot

Comment 2 2010-08-05 07:01:54 PDT

Attachment 63585 [details] did not pass style-queue: Failed to run "['WebKitTools/Scripts/check-webkit-style']" exit_code: 1 WebCore/platform/text/TextEncodingRegistry.cpp:162: Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons. [readability/comparison_to_zero] [5] Total errors found: 1 in 11 files If any of these errors are false positives, please file a bug against check-webkit-style.

Darin Adler

Comment 3 2010-08-05 07:51:20 PDT

Is this going to cause problem with WebKit-specific content that relies on this behavior that we've had for the past 7 or so years?

Darin Adler

Comment 4 2010-08-05 07:55:26 PDT

Comment on attachment 63585 [details] proposed patch > +// If passed any non-ASCII characters, depends on the behavior of isalnum -- > +// if that returns false as it does on OS X, then it will properly skip those > +// characters too. This comment is not true, and should be removed. > struct TextEncodingNameHash { With the changes you have made, is TextEncodingNameHash different from CaseFoldingHash? Can we just use CaseFoldingHash instead? > + if (0 == strcmp(alias, "8859_1")) > + return true; Would be good to add a comment explaining why we don't want this alias. Generally speaking this seems like a good patch, but I am slightly worried about backward compatibility.

Joseph Pecoraro

Comment 5 2010-08-05 08:55:36 PDT

Comment on attachment 63585 [details] proposed patch If you haven't landed this, there is a typo in the ChangeLog: > Index: WebCore/ChangeLog > =================================================================== > + * platform/text/TextEncoding.cpp: (WebCore::Latin1Encoding): > + "Latin-1" is not a real encoding name, it's not known ot Firefox or IE. TYPO: "not known ot Firefox" => "not known to Firefox"

Alexey Proskuryakov

Comment 6 2010-08-05 11:42:50 PDT

> With the changes you have made, is TextEncodingNameHash different from CaseFoldingHash? > Can we just use CaseFoldingHash instead? I'm not sure - the code is different, as it's ASCII-only, but maybe the difference in behavior/performance is negligible. > Is this going to cause problem with WebKit-specific content that relies on this behavior > that we've had for the past 7 or so years? That seems likely, yes. But I think that the extent of forgiveness has changed semi-recently, with the introduction of TextEncodingRegistry. Maybe my memory is wrong.

Alexey Proskuryakov

Comment 7 2010-08-05 22:37:13 PDT

Committed <http://trac.webkit.org/changeset/64817>.

WebKit Review Bot

Comment 8 2010-08-05 22:48:43 PDT

http://trac.webkit.org/changeset/64817 might have broken GTK Linux 64-bit Debug

Alexey Proskuryakov

Comment 9 2010-08-05 23:53:01 PDT

Follow up fix in <http://trac.webkit.org/changeset/64820> - added "shift-jis".

Alexey Proskuryakov

Comment 10 2010-08-06 04:26:09 PDT

More follow-up fixing in r64831 - don't register ISO8859-16, which was causing crashes on platforms with older ICU, and which isn't needed (at least, not yet).

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware All

OS All

Product WebKit

Component DOM

Assignee

Alexey Proskuryakov

Reported

2010-08-05 05:17 PDT

Modified

2010-08-06 04:26 PDT History

CC List

4 users Show

URL

Keywords

Depends on

Blocks