Bug 21893

Summary: Character set is incorrect for external scripts in XHTML pages
Product: WebKit Reporter: Alex Unigovsky <unik>
Component: TextAssignee: Alexey Proskuryakov <ap>
Status: RESOLVED FIXED    
Severity: Normal CC: ap
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Mac   
OS: OS X 10.4   
Attachments:
Description Flags
proposed fix darin: review+

Alex Unigovsky
Reported 2008-10-26 02:46:01 PDT
When serving a page with these headers: -- CUT -- Cache-Control:no-store, no-cache, must-revalidate, post-check=0, pre-check=0 Connection:Keep-Alive Content-Type:application/xhtml+xml; charset=UTF-8 Date:Sun, 26 Oct 2008 09:25:01 GMT Expires:Thu, 19 Nov 1981 08:52:00 GMT Keep-Alive:timeout=15, max=100 Pragma:no-cache Server:Apache Transfer-Encoding:Identity -- CUT -- the encoding of the page is not detected to be in UTF-8. I've tried inserting <?xml version="1.0" encoding="UTF-8"?> at page's 1st line, without success. The page in question is an XHTML 1.1 page with heavy use of JS and dynamically loaded inline SVG graphs (so the usage of application/xhtml+xml is required). Version: WebKit nightly r37894 OS: Mac OS X 10.4.11 Doesn't happen in: Opera 9.6, FF 3.0.1
Attachments
proposed fix (122.59 KB, patch)
2008-10-27 04:18 PDT, Alexey Proskuryakov
darin: review+
Alexey Proskuryakov
Comment 1 2008-10-26 04:58:30 PDT
Is this the same as bug 18308? Hard to tell from the description, but we certainly do honor the aforementioned ways to specify charset in usual cases, not to mention that UTF-8 is the default for application/xhtml+xml.
Alex Unigovsky
Comment 2 2008-10-26 06:26:39 PDT
Ok. This has nothing to do with static content (it renders correctly). I think a have a simplified test case. File: test.html. Served with "Content-Type: application/xhtml+xml; charset=UTF-8" -- CUT -- <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="ru" dir="ltr"> <head> <meta http-equiv="Content-Type" content="application/xhtml+xml; charset=UTF-8" /> </head> <body> <div id="a"></div> <script type="text/javascript" src="script.js"></script> </body> </html> -- CUT -- File: script.js Served with "Content-Type: application/x-javascript" (note no charset!) -- CUT -- var d1 = document.getElementById('a'); var d2 = document.createElement('div'); var x = 'ÀÁÂÃÄ'; d2.appendChild(d2.ownerDocument.createTextNode(x)); d1.appendChild(d2); -- CUT -- Script should insert 5 uppercase cyrillic letters into div#a. But it does not.
Alex Unigovsky
Comment 3 2008-10-26 06:33:08 PDT
Several additional facts I missed. 1. Script runs ok when embedded into the page (like inside <head>). 2. Script runs ok when test.html is served with text/html.
Alexey Proskuryakov
Comment 4 2008-10-26 14:16:00 PDT
Ugh! Confirmed, and this isn't even a ToT regression. (In reply to comment #2) > <meta http-equiv="Content-Type" content="application/xhtml+xml; > charset=UTF-8" /> FWIW, meta declarations have no effect for XHTML documents (not that this affects the validity of this bug in any way).
Alexey Proskuryakov
Comment 5 2008-10-27 04:18:27 PDT
Created attachment 24687 [details] proposed fix Talk about coincidences - turns out that I found a site affected by this very bug a few days ago, but didn't have the time to investigate it yet.
Darin Adler
Comment 6 2008-10-27 07:52:36 PDT
Comment on attachment 24687 [details] proposed fix r=me
Alexey Proskuryakov
Comment 7 2008-10-28 06:43:10 PDT
Note You need to log in before you can comment on or make changes to this bug.