Bug 21893 - Character set is incorrect for external scripts in XHTML pages
Summary: Character set is incorrect for external scripts in XHTML pages
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Text (show other bugs)
Version: 528+ (Nightly build)
Hardware: Mac OS X 10.4
: P2 Normal
Assignee: Alexey Proskuryakov
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-10-26 02:46 PDT by Alex Unigovsky
Modified: 2008-10-28 06:43 PDT (History)
1 user (show)

See Also:


Attachments
proposed fix (122.59 KB, patch)
2008-10-27 04:18 PDT, Alexey Proskuryakov
darin: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alex Unigovsky 2008-10-26 02:46:01 PDT
When serving a page with these headers:

-- CUT --
Cache-Control:no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Connection:Keep-Alive
Content-Type:application/xhtml+xml; charset=UTF-8
Date:Sun, 26 Oct 2008 09:25:01 GMT
Expires:Thu, 19 Nov 1981 08:52:00 GMT
Keep-Alive:timeout=15, max=100
Pragma:no-cache
Server:Apache
Transfer-Encoding:Identity
-- CUT --

the encoding of the page is not detected to be in UTF-8. I've tried inserting <?xml version="1.0" encoding="UTF-8"?> at page's 1st line, without success.

The page in question is an XHTML 1.1 page with heavy use of JS and dynamically loaded inline SVG graphs (so the usage of application/xhtml+xml is required).

Version: WebKit nightly r37894
OS: Mac OS X 10.4.11
Doesn't happen in: Opera 9.6, FF 3.0.1
Comment 1 Alexey Proskuryakov 2008-10-26 04:58:30 PDT
Is this the same as bug 18308? Hard to tell from the description, but we certainly do honor the aforementioned ways to specify charset in usual cases, not to mention that UTF-8 is the default for application/xhtml+xml.
Comment 2 Alex Unigovsky 2008-10-26 06:26:39 PDT
Ok. This has nothing to do with static content (it renders correctly). I think a have a simplified test case.

File: test.html.
Served with "Content-Type: application/xhtml+xml; charset=UTF-8"
-- CUT --
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="ru" dir="ltr">
<head>
	<meta http-equiv="Content-Type" content="application/xhtml+xml; charset=UTF-8" />
</head>
<body>
	<div id="a"></div>
	<script type="text/javascript" src="script.js"></script>
</body>
</html>
-- CUT --

File: script.js
Served with "Content-Type: application/x-javascript" (note no charset!)
-- CUT --
var d1 = document.getElementById('a');
var d2 = document.createElement('div');
var x = 'ÀÁÂÃÄ';
d2.appendChild(d2.ownerDocument.createTextNode(x));
d1.appendChild(d2);
-- CUT --

Script should insert 5 uppercase cyrillic letters into div#a. But it does not.
Comment 3 Alex Unigovsky 2008-10-26 06:33:08 PDT
Several additional facts I missed.

1. Script runs ok when embedded into the page (like inside <head>).
2. Script runs ok when test.html is served with text/html.
Comment 4 Alexey Proskuryakov 2008-10-26 14:16:00 PDT
Ugh! Confirmed, and this isn't even a ToT regression.

(In reply to comment #2)
>         <meta http-equiv="Content-Type" content="application/xhtml+xml;
> charset=UTF-8" />

FWIW, meta declarations have no effect for XHTML documents (not that this affects the validity of this bug in any way).
Comment 5 Alexey Proskuryakov 2008-10-27 04:18:27 PDT
Created attachment 24687 [details]
proposed fix

Talk about coincidences - turns out that I found a site affected by this very bug a few days ago, but didn't have the time to investigate it yet.
Comment 6 Darin Adler 2008-10-27 07:52:36 PDT
Comment on attachment 24687 [details]
proposed fix

r=me
Comment 7 Alexey Proskuryakov 2008-10-28 06:43:10 PDT
Fixed in <http://trac.webkit.org/changeset/37924>.