Bug 17732 - An external JavaScript, after changing its src attribute, which has not charset declaration in its script tag and encoding info in corresponding http response, can not be correctly decoded.
Summary: An external JavaScript, after changing its src attribute, which has not chars...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Page Loading (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC OS X 10.5
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-03-09 12:28 PDT by johnnyding
Modified: 2008-03-15 16:48 PDT (History)
2 users (show)

See Also:


Attachments
error encoding content is showed in right side (104.04 KB, image/jpeg)
2008-03-09 12:30 PDT, johnnyding
no flags Details
html file for this test case (715 bytes, text/html)
2008-03-09 12:31 PDT, johnnyding
no flags Details
script file for this test case (180 bytes, application/octet-stream)
2008-03-09 12:32 PDT, johnnyding
no flags Details
patch v1 for fixing this problem (5.00 KB, patch)
2008-03-09 23:02 PDT, johnnyding
no flags Details | Formatted Diff | Diff
patch v2 for fixing this problem. The change is according to Alexey's comments (7.72 KB, patch)
2008-03-10 03:37 PDT, johnnyding
no flags Details | Formatted Diff | Diff
patch v3, based on patch v2, move method scriptCharset from Document class to HTMLScriptElement class (8.46 KB, patch)
2008-03-10 07:22 PDT, johnnyding
no flags Details | Formatted Diff | Diff
patch v3, based on patch v2, move method scriptCharset from Document class to HTMLScriptElement class (8.42 KB, patch)
2008-03-10 07:30 PDT, johnnyding
ap: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description johnnyding 2008-03-09 12:28:35 PDT
Open http://money.163.com(it is sub domain of www.163.com. www.163.com is one of three largest market web portal in China), there's some encoding error on the right side of the web page. IE & FF handles it correctly. See attached image.
Comment 1 johnnyding 2008-03-09 12:30:23 PDT
Created attachment 19618 [details]
error encoding content is showed in right side
Comment 2 johnnyding 2008-03-09 12:31:39 PDT
Created attachment 19619 [details]
html file for this test case
Comment 3 johnnyding 2008-03-09 12:32:19 PDT
Created attachment 19620 [details]
script file for this test case
Comment 4 johnnyding 2008-03-09 12:36:15 PDT
Download both html file and script file to your local file system. The expected behavior is content after "value which is encoded by gb2312 in html file is: " and "same value which is encoded by gb2312 in extern javascript file is: " should have same display, but they are not.

Please see above attached testcase. After digging in WebKit, I found the error encoding content was generated by an external JavaScript after changing its src attribute, which had not charset declaration in its script tag and encoding info in http response when downloading it. So CachedScript will use "latin1" as encoding charset. See methods CachedScript::CachedScript and Loader::didReceiveResponse. That is why content of the external Javascript had been incorrectly decoded.

I think we can fix this problem by using document()->frame()->loader()->encoding() to create CacheScript obj when the external Javascript has not charset declaration in its script tag(getAttribute(charsetAttr) return null string). See method HTMLScriptElement::parseMappedAttribute. This behavior makes sense just like getting correct scriptSrcCharset in HTMLTokenizer::parseTag.  I have changed code in my local build, it works.

A patch is coming soon.
Comment 5 johnnyding 2008-03-09 23:02:44 PDT
Created attachment 19627 [details]
patch v1 for fixing this problem
Comment 6 Alexey Proskuryakov 2008-03-10 00:01:00 PDT
Looking great!

The code in parseMappedAttribute() still doesn't quite match what we have in insertedIntoDocument(). The latter also calls stripWhiteSpace(). I suggest creating a helper function that would be called from both places, rather than duplicating the logic.

+         script_obj.src = "./resources/script-decoding-error-after-setting-src.js";

I'm surprised that this works without layoutTestController.waitUntilDone()/notifyDone().

+    This case is for testing script decoding error after setting its src attribute.<br>

A link to this bug would be helpful. Also, I would omit the word "error" here.

In the future, please set review flag on patches - this ensures that they get into review queue and don't get lost.
Comment 7 johnnyding 2008-03-10 03:37:07 PDT
Created attachment 19630 [details]
patch v2 for fixing this problem. The change is according to Alexey's comments
Comment 8 Alexey Proskuryakov 2008-03-10 04:41:33 PDT
Comment on attachment 19630 [details]
patch v2 for fixing this problem. The change is according to Alexey's comments

Hmm, I would have added scriptCharset() to HTMLScriptElement, not Document - or is there a reason why this isn't possible? Document is so large that one would not want to add stuff to it unless absolutely necessary.
Comment 9 johnnyding 2008-03-10 07:22:54 PDT
Created attachment 19633 [details]
patch v3, based on patch v2, move method scriptCharset from Document class to HTMLScriptElement class
Comment 10 johnnyding 2008-03-10 07:30:24 PDT
Created attachment 19634 [details]
patch v3, based on patch v2, move method scriptCharset from Document class to HTMLScriptElement class
Comment 11 Alexey Proskuryakov 2008-03-10 08:18:04 PDT
Comment on attachment 19634 [details]
patch v3, based on patch v2, move method scriptCharset from Document class to HTMLScriptElement class

r=me

I'm going to tweak wording of some comments a bit when landing.
Comment 12 johnnyding 2008-03-10 08:28:22 PDT
(In reply to comment #11)
> (From update of attachment 19634 [details] [edit])
> r=me
> I'm going to tweak wording of some comments a bit when landing.


Thanks.
Comment 13 Alexey Proskuryakov 2008-03-10 08:39:49 PDT
Committed revision 30928.