Bug 3590

Summary: should allow <meta> tags for encoding even after </head>
Product: WebKit Reporter: Nicholas Shanks <nickshanks>
Component: Layout and RenderingAssignee: Darin Adler <darin>
Status: VERIFIED FIXED    
Severity: Normal CC: ap
Priority: P2    
Version: 412   
Hardware: Mac   
OS: OS X 10.4   
URL: http://caracol.com.co/noticias/180350.asp
Attachments:
Description Flags
testcase
none
META outside HEAD
none
META outside HEAD patch
darin: review+
A better test for META outside HEAD none

Nicholas Shanks
Reported 2005-06-17 06:37:54 PDT
Adapted from my comments to bug 3556: 1) Go to safari preferences 2) Set default encoding to UTF-8 3) Browse the internet for a bit, for example http://caracol.com.co/noticias/180350.asp You will see that many sites aren't sending encoding information, Safari is ignoring the Content- Encoding HTTP header override <meta> tag, or it's ignoring the XML charset information for xhtml served as text/html, (or all of the above, I can't really tell). Whatever the cause, now that bug 3556 has been fixed it makes websites a little harder to read, as the user is not aware that the wrong encoding was being used. Words appear with letters missing, and this might change their meanings! One solution I can think of would be to note all the invalid characters encountered and try to match up a likely encoding, based on document language perhaps, then suggest a document re-interpretation to the user. You might implement a list of trigger words where for example a certain sequence of bytes is almost certainly the word добрый in KOI8-R, for example. The lack of encoding information is something that should be reported as an error when in web developer mode too. (p.s. Safari needs a web developer mode :-)
Attachments
testcase (35.95 KB, text/html)
2005-06-17 11:37 PDT, Joost de Valk (AlthA)
no flags
META outside HEAD (395 bytes, text/html)
2005-08-06 04:20 PDT, Alexey Proskuryakov
no flags
META outside HEAD patch (1.50 KB, patch)
2005-08-09 12:15 PDT, Alexey Proskuryakov
darin: review+
A better test for META outside HEAD (1.53 KB, application/octet-stream)
2005-08-10 12:49 PDT, Alexey Proskuryakov
no flags
Joost de Valk (AlthA)
Comment 1 2005-06-17 11:36:24 PDT
I don't see any charset in the header: HTTP/1.1 200 OK Date: Fri, 17 Jun 2005 18:34:25 GMT Server: Microsoft-IIS/6.0 pragma: no-cache cache-control: max-age=0,no-cache,private,must-revalidate Content-Length: 37364 Content-Type: text/html Set-Cookie: ASPSESSIONIDSCDSADQR=LJNNDOHBOMKOOPPOMMIHCCKF; path=/ Cache-control: private nor in the head: <html> <head> <title>Noticias - Caracol Radio</title> <base href="http://www.caracol.com.co/"> <META NAME="DESCRIPTION" CONTENT="Caracol Radio :: Diez muertos y decenas de heridos por el terremoto de 7.9 grados en el norte de Chile"> <META NAME="ROBOTS" CONTENT="INDEX,FOLLOW"> <meta http-equiv=refresh content=300> <link rel="stylesheet" href="/stylos.css" type="text/css"> <script src="/scripts/base.js" type="text/javascript"></script> </head> In the testcase i will attach in a few secs, i have added a content-type meta in the head, which makes safari render the page just fine...
Joost de Valk (AlthA)
Comment 2 2005-06-17 11:37:04 PDT
Created attachment 2445 [details] testcase
Alexey Proskuryakov
Comment 3 2005-06-17 11:54:17 PDT
I have researched automatic charset detection for a while, here are my findings: 1. Safari does handle the encoding specified in Content-Type "meta http-equiv" header. However, see rdar://4127219 - Safari only looks for META inside HEAD. This may be formally correct, but some sites put the META after HEAD, and other browsers support this. 2. WebCore seems to have encoding auto-detection for Japanese. 3. I could not find where WebCore considers the HTTP Content-Type header (which should override the HTML META, not the other way). The question I have is when to attempt automatic guessing. In my experience, incorrect encoding doesn't usually lead to invalid characters. But trying to always detect the language and encoding may cause unpleasant user experience, especially with multi-language texts. As an aside, it seems that ICU is considering support for automatic charset detection (see the recent entries at http://icu.sourceforge.net/meetings/).
Joost de Valk (AlthA)
Comment 4 2005-06-23 10:10:16 PDT
sounds good to me :)
Alexey Proskuryakov
Comment 5 2005-08-06 04:20:23 PDT
Created attachment 3239 [details] META outside HEAD As for #1 (looking for "meta http-equiv" only within HEAD), here's a relevant Mozilla issue: <https://bugzilla.mozilla.org/show_bug.cgi?id=98700>. The attached testcase renders fine in Firefox, but not in Safari (to make it render in Safari, manually choose KOI8-R text encoding). The real life site is <http://www.oper.ru>.
Alexey Proskuryakov
Comment 6 2005-08-09 12:15:53 PDT
Created attachment 3294 [details] META outside HEAD patch With this patch, Decoder::decode() doesn't stop looking for a charset after a </head>. Also, source fixed to compile with DECODE_DEBUG.
Darin Adler
Comment 7 2005-08-10 10:38:14 PDT
Comment on attachment 3294 [details] META outside HEAD patch This change looks great, but we need to pair each change with a new layout test. As far as I can tell the test attached to this bug is for a different issue. I think we're going to have to make more separate bug reports for these issues, and I'll set this patch to review+ once I see it alongside a suitable layout test.
Alexey Proskuryakov
Comment 8 2005-08-10 12:49:54 PDT
Created attachment 3324 [details] A better test for META outside HEAD Includes English comments for easy manual testing and expected DumpRenderTree output.
Alexey Proskuryakov
Comment 9 2005-08-10 13:08:36 PDT
Comment on attachment 3294 [details] META outside HEAD patch So far, I cannot confirm any of the mentioned possible issues with the current implementation, except for the META outside HEAD one. If any are found, I'll file them separately, to keep this issue focused on auto-detection.
Darin Adler
Comment 10 2005-08-10 13:25:36 PDT
Comment on attachment 3294 [details] META outside HEAD patch Looks good. r=me
Darin Adler
Comment 11 2005-08-14 01:22:55 PDT
Retitling to reflect what's actually being fixed here. The bigger "cosmic issue" will have to be covered by other bug reports.
Note You need to log in before you can comment on or make changes to this bug.