Bug 14636 - REGRESSION: broken tags with unpaired quote prevents encode autodetection
Summary: REGRESSION: broken tags with unpaired quote prevents encode autodetection
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: Page Loading (show other bugs)
Version: 523.x (Safari 3)
Hardware: All All
: P1 Major
Assignee: Nobody
URL: http://developer.apple.com/jp/
Keywords: HasReduction, InRadar, Regression
: 14643 (view as bug list)
Depends on:
Blocks:
 
Reported: 2007-07-17 03:00 PDT by 808caaa4.8ce9.9cd6c799e9f6
Modified: 2010-10-15 13:13 PDT (History)
6 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description 808caaa4.8ce9.9cd6c799e9f6 2007-07-17 03:00:44 PDT
// derived from bugs#14601

With some broken meta tags like:

> <meta http-equiv="Content-Type" content="text/html; charset="utf-8">

detectJapaneseEncoding() seems to not to be called.

For not-collectly-paired \x22, checkForHeadCharset() lost sync for quote and
runs out whole the content absorbed with returns-false
(at 'if(ptr == pEnd) return false;' line 588).

Tag/content may not contain linefeeds with almost websites.
I think successfully aborting at scanning quote pair
when linefeed occuered is with reality.

My experimental code.
-----
while (ptr != pEnd && *ptr != quoteMark)
{
	if(*ptr=='\r' || *ptr=='\n'){
		// too long tag content : may lost sync
		// successfully bail out
		m_checkedForHeadCharset = true;
		return true;
	}
++ptr;
}
-----
Comment 1 Alexey Proskuryakov 2007-07-17 04:00:18 PDT
This is a regression from shipping WebKit, upgrading to P1.

See <http://www.whatwg.org/specs/web-apps/current-work/#get-an> - if I'm reading it correctly, we are not supposed to honor such a META. Which might mean that we need to suggest a correction to the HTML5 algorithm.

Also, I'm not sure why Firefox works - it's possible that it ignores the META, and auto-detects the encoding based on page text analysis.
Comment 2 David Kilzer (:ddkilzer) 2007-07-17 08:27:30 PDT
*** Bug 14643 has been marked as a duplicate of this bug. ***
Comment 3 David Kilzer (:ddkilzer) 2007-07-17 08:29:01 PDT
<rdar://problem/5340161>