Bug 14500

Summary: need to be more generous about charset declaration with meta tag
Product: WebKit Reporter: Jungshik Shin <jshin>
Component: Page LoadingAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: ap, ddkilzer
Priority: P2    
Version: 523.x (Safari 3)   
Hardware: All   
OS: All   
Attachments:
Description Flags
Yahoo! Mail example none

Description Jungshik Shin 2007-07-02 15:52:22 PDT
http://hanarei.blog32.fc2.com/

has a strange structure. Note that html tag appears twice and so does head. charset definition in meta tag appears in the 2nd head tag.  WebKit does not honor it while FF and IE do.


<HTML>
<HEAD>
<TITLE>無料オンラインゲームに参加しよう♪</TITLE>
</HEAD>
<BODY>
<script type="text/javascript"><!--
var ID="100099131";
var AD=0;
var FRAME=0;
// --></script>
<script src="http://j1.ax.xrea.com/l.j?id=100099131" type="text/javascript"></script>
<noscript>
<a href="http://w1.ax.xrea.com/c.f?id=100099131" target="_blank"><img src="http://w1.ax.xrea.com/l.f?id=100099131&url=X" alt="AX" border="0"></a>
</noscript>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html lang="ja" xmlns="http://www.w3.org/1999/xhtml" xml:lang="ja">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=EUC-JP" />
Comment 1 Alexey Proskuryakov 2007-07-02 22:27:21 PDT
WebKit currently stops to look for charset as soon as it reaches document body (for performance reasons).

See also: bug 12526.
Comment 2 Jungshik Shin 2007-07-16 14:38:44 PDT
http://db66.vnet.cn/
is a variation on this.  Its strucutre
is

<script> very long .... </script><form> ...</form> <script ..></script>
<html>
<head>
<meta .... charset ...>
.....
Comment 3 Jungshik Shin 2007-11-06 13:35:32 PST
Another variation:

http://floraexpress.ru/

It starts with "<input>" tag.  Later, it has the correct <meta> tag to indicate charset. 

Comment 4 David Kilzer (:ddkilzer) 2007-11-07 05:29:19 PST
Created attachment 17107 [details]
Yahoo! Mail example

This (partial) reduction is an example of a HTML-based mail message (about Sandvox) rendering with the wrong charset due to a "late" <meta> tag.  It was originally displayed within Yahoo! Mail, although I stripped out almost all of the Y! Mail bits for the reduction.

Note the rendering of the apostrophes in the body of the message, and compare to Opera 9.22 and Firefox 2.0.0.9.
Comment 5 Alexey Proskuryakov 2007-12-27 00:38:05 PST
Fixed in <http://trac.webkit.org/projects/webkit/changeset/28998>.