Bug 14636
| Summary: | REGRESSION: broken tags with unpaired quote prevents encode autodetection | ||
|---|---|---|---|
| Product: | WebKit | Reporter: | 808caaa4.8ce9.9cd6c799e9f6 |
| Component: | Page Loading | Assignee: | Nobody <webkit-unassigned> |
| Status: | NEW | ||
| Severity: | Major | CC: | ap, darin, ddkilzer, dimich, emacemac7, ian |
| Priority: | P1 | Keywords: | HasReduction, InRadar, Regression |
| Version: | 523.x (Safari 3) | ||
| Hardware: | All | ||
| OS: | All | ||
| URL: | http://developer.apple.com/jp/ | ||
808caaa4.8ce9.9cd6c799e9f6
// derived from bugs#14601
With some broken meta tags like:
> <meta http-equiv="Content-Type" content="text/html; charset="utf-8">
detectJapaneseEncoding() seems to not to be called.
For not-collectly-paired \x22, checkForHeadCharset() lost sync for quote and
runs out whole the content absorbed with returns-false
(at 'if(ptr == pEnd) return false;' line 588).
Tag/content may not contain linefeeds with almost websites.
I think successfully aborting at scanning quote pair
when linefeed occuered is with reality.
My experimental code.
-----
while (ptr != pEnd && *ptr != quoteMark)
{
if(*ptr=='\r' || *ptr=='\n'){
// too long tag content : may lost sync
// successfully bail out
m_checkedForHeadCharset = true;
return true;
}
++ptr;
}
-----
| Attachments | ||
|---|---|---|
| Add attachment proposed patch, testcase, etc. |
Alexey Proskuryakov
This is a regression from shipping WebKit, upgrading to P1.
See <http://www.whatwg.org/specs/web-apps/current-work/#get-an> - if I'm reading it correctly, we are not supposed to honor such a META. Which might mean that we need to suggest a correction to the HTML5 algorithm.
Also, I'm not sure why Firefox works - it's possible that it ignores the META, and auto-detects the encoding based on page text analysis.
David Kilzer (:ddkilzer)
*** Bug 14643 has been marked as a duplicate of this bug. ***
David Kilzer (:ddkilzer)
<rdar://problem/5340161>