WebKit Bugzilla
New
Browse
Search+
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED WONTFIX
54582
REGRESSION (
r73756
): Some page content not displaying correctly at pay.37wan.com
https://bugs.webkit.org/show_bug.cgi?id=54582
Summary
REGRESSION (r73756 ): Some page content not displaying correctly at pay.37wan...
Adele Peterson
Reported
2011-02-16 13:53:58 PST
Created
attachment 82686
[details]
test Steps to reproduce: 1. Navigate to
http://www.37wan.com/
. 2. Enter username into top text field 3. Enter password into bottom text field 4. Press login button (left orange button). 5. Press index button (Middle yellow button within orange box). --Result: Most of the text on the page is character garbage. If you don't have a login, please see the attached reduction, which is the following markup: <meta http-equiv="Content-Type" contet="text/html; charset=UTF-8" /> 充值中心| webgame-37wan网页游戏平台 So this site has a typo with "contet". If this works with all shipping browsers, we should probably send feedback to HTML WG, and make a fix. <
rdar://problem/9006151
>
Attachments
test
(115 bytes, text/html)
2011-02-16 13:53 PST
,
Adele Peterson
no flags
Details
View All
Add attachment
proposed patch, testcase, etc.
Adele Peterson
Comment 1
2011-02-16 13:56:45 PST
This is the change that caused the regression:
http://trac.webkit.org/changeset/73756
TextResourceDecoder::checkForHeadCharset can look way past the limit.
https://bugs.webkit.org/show_bug.cgi?id=47397
Andy Estes
Comment 2
2011-02-16 13:59:47 PST
Note that this test case isn't quite right since bugzilla is serving it with a correct Content-Type header (text/html; charset=UTF-8). Saving the testcase and opening it locally should do the trick.
Jenn Braithwaite
Comment 3
2011-02-16 14:43:46 PST
(In reply to
comment #1
)
> This is the change that caused the regression: > >
http://trac.webkit.org/changeset/73756
> TextResourceDecoder::checkForHeadCharset can look way past the limit. >
https://bugs.webkit.org/show_bug.cgi?id=47397
Prior to this change, browsers were extremely lenient with charset specification syntax. Something like: <meta http-equiv="Content-Type" foobar="notacharset=UTF-8" /> 充值中心| webgame-37wan网页游戏平台 would still work. As the goal of the change was to make charset detection more precise, I recommend not going backwards even if the above (and the test case for this bug) work with all shipping browsers.
Alexey Proskuryakov
Comment 4
2011-02-16 14:59:27 PST
It's extremely easy to make a typo like the one here - and if it used to work in all browsers, breaking that would be unfortunate. Generally speaking, making parsing more strict is rarely helpful.
Jenn Braithwaite
Comment 5
2011-02-16 15:38:04 PST
A small deviation from the spec would be if "http-equiv='Content-Type' is seen in a meta tag, extract "charset=xxx" from other attributes regardless of attribute name. If this sounds acceptable, I'll make the change.
Adam Barth
Comment 6
2011-02-16 15:39:22 PST
(In reply to
comment #5
)
> A small deviation from the spec would be if "http-equiv='Content-Type' is seen in a meta tag, extract "charset=xxx" from other attributes regardless of attribute name. > > If this sounds acceptable, I'll make the change.
You should probably grab Hixie on #whatwg and touch base with him before implementing that change.
Ian 'Hixie' Hickson
Comment 7
2011-02-16 16:04:24 PST
From what I can tell, this would be a wrong change. In particular, if it was correct, I'd expect all the browsers to get a "fail" (1252) on this test case, but every browser I tested gets a "pass" (1254):
http://www.hixie.ch/tests/adhoc/html/parsing/encoding/142.html
(To control for declaration order, I also have this test:
http://www.hixie.ch/tests/adhoc/html/parsing/encoding/143.html
...which seems to indicate the issue is not that browsers are just using the last one.)
Ian 'Hixie' Hickson
Comment 8
2011-02-16 16:31:53 PST
Looks like IE8, FF4, Opera 11, and the next versions of WebKit and Chrome will all break this page in the same way. I would vote to leave it broken. The alternative is making the parsing of charset decls way more complicated and breaking away from IE8 compat on this issue, which seems suboptimal.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug