Bug 43554 - Way too many encoding aliases are treated as valid
Summary: Way too many encoding aliases are treated as valid
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: DOM (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Alexey Proskuryakov
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-08-05 05:17 PDT by Alexey Proskuryakov
Modified: 2010-08-06 04:26 PDT (History)
4 users (show)

See Also:


Attachments
proposed patch (17.65 KB, patch)
2010-08-05 06:59 PDT, Alexey Proskuryakov
darin: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alexey Proskuryakov 2010-08-05 05:17:25 PDT
TextEncodingRegistry ignores non-alphanumeric characters when comparing encoding names. This is not what other browsers do, and this causes compatibility problems when incorrectly specified encoding is expected to be ignored and inherited from parent frame.

Some examples of what WebKit parses, but other browsers don't:

    "8859_1", // <rdar://problem/7859068>
    "ISO8859_1", // <rdar://problem/7863399>
    "8859-1", // WebKit used to specifically add this alias name, but other browsers don't support it.
    "ISO_2022,locale=ja,version=0", // We never want versioned alias names, other browsers don't support these.
    "utf 8", // Other weird variations
    "utf_8",
    "8859 1",
    "8859*1",
    "8859:1",
    "88591",
    "ISO_88591",
    "ISO-88591",
    "ISO-88-59-1",
    "latin-1" // Yes, it's "latin1" without a dash - neither IE nor Firefox support "latin-1".
Comment 1 Alexey Proskuryakov 2010-08-05 06:59:56 PDT
Created attachment 63585 [details]
proposed patch
Comment 2 WebKit Review Bot 2010-08-05 07:01:54 PDT
Attachment 63585 [details] did not pass style-queue:

Failed to run "['WebKitTools/Scripts/check-webkit-style']" exit_code: 1
WebCore/platform/text/TextEncodingRegistry.cpp:162:  Tests for true/false, null/non-null, and zero/non-zero should all be done without equality comparisons.  [readability/comparison_to_zero] [5]
Total errors found: 1 in 11 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 3 Darin Adler 2010-08-05 07:51:20 PDT
Is this going to cause problem with WebKit-specific content that relies on this behavior that we've had for the past 7 or so years?
Comment 4 Darin Adler 2010-08-05 07:55:26 PDT
Comment on attachment 63585 [details]
proposed patch

> +// If passed any non-ASCII characters, depends on the behavior of isalnum --
> +// if that returns false as it does on OS X, then it will properly skip those
> +// characters too.

This comment is not true, and should be removed.

>  struct TextEncodingNameHash {

With the changes you have made, is TextEncodingNameHash different from CaseFoldingHash? Can we just use CaseFoldingHash instead?

> +    if (0 == strcmp(alias, "8859_1"))
> +        return true;

Would be good to add a comment explaining why we don't want this alias.

Generally speaking this seems like a good patch, but I am slightly worried about backward compatibility.
Comment 5 Joseph Pecoraro 2010-08-05 08:55:36 PDT
Comment on attachment 63585 [details]
proposed patch

If you haven't landed this, there is a typo in the ChangeLog:

> Index: WebCore/ChangeLog
> ===================================================================
> +        * platform/text/TextEncoding.cpp: (WebCore::Latin1Encoding):
> +        "Latin-1" is not a real encoding name, it's not known ot Firefox or IE.

TYPO: "not known ot Firefox" => "not known to Firefox"
Comment 6 Alexey Proskuryakov 2010-08-05 11:42:50 PDT
> With the changes you have made, is TextEncodingNameHash different from CaseFoldingHash? 
> Can we just use CaseFoldingHash instead?

I'm not sure - the code is different, as it's ASCII-only, but maybe the difference in behavior/performance is negligible.

> Is this going to cause problem with WebKit-specific content that relies on this behavior
>  that we've had for the past 7 or so years?

That seems likely, yes. But I think that the extent of forgiveness has changed semi-recently, with the introduction of TextEncodingRegistry. Maybe my memory is wrong.
Comment 7 Alexey Proskuryakov 2010-08-05 22:37:13 PDT
Committed <http://trac.webkit.org/changeset/64817>.
Comment 8 WebKit Review Bot 2010-08-05 22:48:43 PDT
http://trac.webkit.org/changeset/64817 might have broken GTK Linux 64-bit Debug
Comment 9 Alexey Proskuryakov 2010-08-05 23:53:01 PDT
Follow up fix in <http://trac.webkit.org/changeset/64820> - added "shift-jis".
Comment 10 Alexey Proskuryakov 2010-08-06 04:26:09 PDT
More follow-up fixing in r64831 - don't register ISO8859-16, which was causing crashes on platforms with older ICU, and which isn't needed (at least, not yet).