Summary: | should allow <meta> tags for encoding even after </head> | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Nicholas Shanks <nickshanks> | ||||||||||
Component: | Layout and Rendering | Assignee: | Darin Adler <darin> | ||||||||||
Status: | VERIFIED FIXED | ||||||||||||
Severity: | Normal | CC: | ap | ||||||||||
Priority: | P2 | ||||||||||||
Version: | 412 | ||||||||||||
Hardware: | Mac | ||||||||||||
OS: | OS X 10.4 | ||||||||||||
URL: | http://caracol.com.co/noticias/180350.asp | ||||||||||||
Attachments: |
|
Description
Nicholas Shanks
2005-06-17 06:37:54 PDT
I don't see any charset in the header: HTTP/1.1 200 OK Date: Fri, 17 Jun 2005 18:34:25 GMT Server: Microsoft-IIS/6.0 pragma: no-cache cache-control: max-age=0,no-cache,private,must-revalidate Content-Length: 37364 Content-Type: text/html Set-Cookie: ASPSESSIONIDSCDSADQR=LJNNDOHBOMKOOPPOMMIHCCKF; path=/ Cache-control: private nor in the head: <html> <head> <title>Noticias - Caracol Radio</title> <base href="http://www.caracol.com.co/"> <META NAME="DESCRIPTION" CONTENT="Caracol Radio :: Diez muertos y decenas de heridos por el terremoto de 7.9 grados en el norte de Chile"> <META NAME="ROBOTS" CONTENT="INDEX,FOLLOW"> <meta http-equiv=refresh content=300> <link rel="stylesheet" href="/stylos.css" type="text/css"> <script src="/scripts/base.js" type="text/javascript"></script> </head> In the testcase i will attach in a few secs, i have added a content-type meta in the head, which makes safari render the page just fine... Created attachment 2445 [details]
testcase
I have researched automatic charset detection for a while, here are my findings: 1. Safari does handle the encoding specified in Content-Type "meta http-equiv" header. However, see rdar://4127219 - Safari only looks for META inside HEAD. This may be formally correct, but some sites put the META after HEAD, and other browsers support this. 2. WebCore seems to have encoding auto-detection for Japanese. 3. I could not find where WebCore considers the HTTP Content-Type header (which should override the HTML META, not the other way). The question I have is when to attempt automatic guessing. In my experience, incorrect encoding doesn't usually lead to invalid characters. But trying to always detect the language and encoding may cause unpleasant user experience, especially with multi-language texts. As an aside, it seems that ICU is considering support for automatic charset detection (see the recent entries at http://icu.sourceforge.net/meetings/). sounds good to me :) Created attachment 3239 [details] META outside HEAD As for #1 (looking for "meta http-equiv" only within HEAD), here's a relevant Mozilla issue: <https://bugzilla.mozilla.org/show_bug.cgi?id=98700>. The attached testcase renders fine in Firefox, but not in Safari (to make it render in Safari, manually choose KOI8-R text encoding). The real life site is <http://www.oper.ru>. Created attachment 3294 [details]
META outside HEAD patch
With this patch, Decoder::decode() doesn't stop looking for a charset after a
</head>.
Also, source fixed to compile with DECODE_DEBUG.
Comment on attachment 3294 [details]
META outside HEAD patch
This change looks great, but we need to pair each change with a new layout
test. As far as I can tell the test attached to this bug is for a different
issue.
I think we're going to have to make more separate bug reports for these issues,
and I'll set this patch to review+ once I see it alongside a suitable layout
test.
Created attachment 3324 [details]
A better test for META outside HEAD
Includes English comments for easy manual testing and expected DumpRenderTree
output.
Comment on attachment 3294 [details]
META outside HEAD patch
So far, I cannot confirm any of the mentioned possible issues with the current
implementation, except for the META outside HEAD one. If any are found, I'll
file them separately, to keep this issue focused on auto-detection.
Comment on attachment 3294 [details]
META outside HEAD patch
Looks good. r=me
Retitling to reflect what's actually being fixed here. The bigger "cosmic issue" will have to be covered by other bug reports. |