Bug 97307

Summary: Default encoding should not be inherited to child frame.
Product: WebKit Reporter: Kangil Han <kangil.han>
Component: TextAssignee: Kangil Han <kangil.han>
Status: NEW    
Severity: Normal CC: abarth, ap, darin, dglazkov, eric, japhet, mitz, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Bug Depends on:    
Bug Blocks: 97054    
Attachments:
Description Flags
patch
ap: review-, buildbot: commit-queue-
patch
ap: review-
patch ap: review-

Kangil Han
Reported 2012-09-21 01:21:26 PDT
Only meta tag charset attribute should be inherited.
Attachments
patch (7.12 KB, patch)
2012-09-21 02:06 PDT, Kangil Han
ap: review-
buildbot: commit-queue-
patch (7.34 KB, patch)
2012-09-24 06:28 PDT, Kangil Han
ap: review-
patch (7.40 KB, patch)
2012-10-07 01:59 PDT, Kangil Han
ap: review-
Kangil Han
Comment 1 2012-09-21 01:35:35 PDT
(In reply to comment #0) > Only meta tag charset attribute should be inherited. Please ignore above comment. Only default encoding shouldn't be inherited.
Kangil Han
Comment 2 2012-09-21 02:06:57 PDT
Build Bot
Comment 3 2012-09-21 03:57:54 PDT
Comment on attachment 165081 [details] patch Attachment 165081 [details] did not pass mac-ews (mac): Output: http://queues.webkit.org/results/13948605 New failing tests: http/tests/security/javascriptURL/xss-ALLOWED-from-javascript-url-sub-frame-2-level.html http/tests/security/javascriptURL/xss-ALLOWED-from-javascript-url-to-javscript-url.html http/tests/security/javascriptURL/xss-ALLOWED-to-javascript-url-sub-frame-2-level.html http/tests/security/javascriptURL/xss-ALLOWED-to-javascript-url-from-javscript-url.html
WebKit Review Bot
Comment 4 2012-09-21 05:59:38 PDT
Comment on attachment 165081 [details] patch Attachment 165081 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/13950616 New failing tests: http/tests/security/javascriptURL/xss-ALLOWED-from-javascript-url-sub-frame-2-level.html http/tests/security/javascriptURL/xss-ALLOWED-from-javascript-url-to-javscript-url.html http/tests/security/javascriptURL/xss-ALLOWED-to-javascript-url-sub-frame-2-level.html http/tests/security/javascriptURL/xss-ALLOWED-to-javascript-url-from-javscript-url.html
Alexey Proskuryakov
Comment 5 2012-09-21 08:23:30 PDT
Comment on attachment 165081 [details] patch r- due to lack of rationale and to broken builds.
Eric Seidel (no email)
Comment 6 2012-09-21 09:46:45 PDT
Yes, please explain in greater detail why we would want this change. Does it match a spec? Does it match FF/Opera/IE? What information do we have to suggest this is the correct change?
Kangil Han
Comment 7 2012-09-23 20:28:56 PDT
(In reply to comment #6) > Yes, please explain in greater detail why we would want this change. Does it match a spec? Does it match FF/Opera/IE? What information do we have to suggest this is the correct change? First of all, thanks Eric for kind elaboration! 1. I haven't found any description to specify about charset inheritance in w3c spec documentation. 2. My IE8 / FF 15.0.1 nicely showed attached two test cases. Regarding 4 newly failed test cases with this patch, I will do deeper investigation and get back with result. I hope I would get back with result in opera as the website doesn't work at this moment. :-)
Eric Seidel (no email)
Comment 8 2012-09-23 21:06:10 PDT
Eric Seidel (no email)
Comment 9 2012-09-23 21:07:20 PDT
If you believe the spec to be incomplete in this regard, please file a bug with HTML5 (there is a link in the spec). We can also CC ian@hixie.ch on this bug for clarification if needed.
Kangil Han
Comment 10 2012-09-24 04:28:28 PDT
(In reply to comment #8) > Does HTML5 cover this? It appears to discuss charset at some length: > http://www.whatwg.org/specs/web-apps/current-work/#determining-the-character-encoding Thanks Eric for above url! I think this patch can be landed because each of two newly added cases is: 1. Parent has good charset definition, but child hasn't -> No. 1 in above url. 2. Parent hasn't charset definition, but child has -> No. 6
Kangil Han
Comment 11 2012-09-24 06:28:13 PDT
Created attachment 165366 [details] patch Fixed 4 layout test crash from 1st patch.
Eric Seidel (no email)
Comment 12 2012-10-03 22:30:18 PDT
ap is really your man here. He's been the most active in the encoding code as of late. The question here is if this matches other browsers and HTML5. :)
Kangil Han
Comment 13 2012-10-03 23:09:11 PDT
(In reply to comment #12) > ap is really your man here. He's been the most active in the encoding code as of late. Oh, I see. I will ask @ap for further review. >The question here is if this matches other browsers and HTML5. :) Sure, I would continuously improve this patch till all doubts being gone away. Thanks! :-)
Alexey Proskuryakov
Comment 14 2012-10-04 09:48:24 PDT
Comment on attachment 165366 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=165366&action=review > LayoutTests/ChangeLog:19 > + * fast/encoding/resources/bad-euckr-encoding.html: Added. > + * fast/encoding/resources/good-euckr-encoding.html: Added. As far as I can see, both files are encoded in exactly the same way. Are these actually "bad" and "good"? If the only difference is that one has a charset declaration and another doesn't, please rename them appropriately. > LayoutTests/fast/encoding/parent-child-1-expected.txt:1 > +ê¸ë¡ë²ì ì¤ì¬ It's better to have expected results that can be assessed by a human. You could just add a paragraph of text saying "The below line should look like XXXX:". Or you could go further and programmatically check text content, printing "PASS"/"FAIL" fro result. > Source/WebCore/ChangeLog:12 > + No. 6 in http://www.whatwg.org/specs/web-apps/current-work/#determining-the-character-encoding describes > + the behavior that the use of nested document's charset definition if no. 1 isn't affected to the case. > + > + However, under current implementation, child frame always uses parent's default encoding. > + Therefore, this patch implemented above no. 6 determination step for improvement of charset detection. I do not understand this explanation. What is the case you are trying to address? Is it that you want charset detector to work in subframes if parent frame doesn't have an encoding declaration? > Source/WebCore/loader/TextResourceDecoder.cpp:687 > +bool TextResourceDecoder::matchEncodingSource(EncodingSource source) > +{ > + return m_source == source; > +} This function has a poor name - it's names as if it had a side effect, but it does not. Since it's only used in one place, I suggest just getting rid of it. There is no need to abstract out "==".
Kangil Han
Comment 15 2012-10-07 01:59:55 PDT
Created attachment 167476 [details] patch Done~
Alexey Proskuryakov
Comment 16 2012-10-08 08:59:53 PDT
Comment on attachment 167476 [details] patch View in context: https://bugs.webkit.org/attachment.cgi?id=167476&action=review > LayoutTests/fast/encoding/parent-child-2-expected.txt:1 > +The test result should look like this : ê¸ë¡ë²ì ì¤ì¬ This is not self-explanatory yet. When you are saying that test result should look like something, this explanation should be separate from test result itself. Here, there is no way to tell if output is correct. The reason why we want self-explanatory tests is that sometimes, one would run them manually in a browser, outside run-webkit-tests. For example, folks may want to run the test in other browsers to compare behavior. > LayoutTests/fast/encoding/parent-child-2.html:13 > + if (window.testRunner) { > + testRunner.dumpAsText(); > + var value = document.getElementById("test").contentDocument.documentElement.getElementsByTagName("p")[0].innerHTML; > + document.getElementById("result").innerHTML = "The test result should look like this : " + value; > + } Why are the two bottom lines inside "if (window.testRunner)"? This makes the test dysfunctional inside browser. > Source/WebCore/ChangeLog:11 > + According to mentioned url, child frame can use its own specified encoding if there is 'no feasible (a.k.a. default)' parent's one. This still appears misleading. Child frame's specified encoding always takes precedence over what's inherited from parent frame. Inherited charset is only used when there is no encoding for child frame. > Source/WebCore/loader/TextResourceDecoder.cpp:687 > +bool TextResourceDecoder::isDefaultEncoding() > +{ > + return m_source == DefaultEncoding; > +} This could be inline in the header. > Source/WebCore/loader/TextResourceDecoder.h:68 > + bool isDefaultEncoding(); This function should be const.
Kangil Han
Comment 17 2012-10-09 08:33:21 PDT
(In reply to comment #16) > This still appears misleading. Child frame's specified encoding always takes precedence over what's inherited from parent frame. Inherited charset is only used when there is no encoding for child frame. You are correct. :-) Child frame can override charset decoder and only inherit parent's one if it hasn't charset definition. My idea on this patch has been twisted because of another bug. I will explain more on following comment. ;)
Kangil Han
Comment 18 2012-10-09 08:46:32 PDT
I started this patch to cover BUG 97054. What I wanted to create test case in BUG 97054 was that test frame has iframe and both don't have charset definition. In iframe, I wanted to describe the failing case, however, in this circumstance, encoding detector didn't work on iframe because parent's default encoding ruled to it. Then where I started to be twisted? :-) parent-child-1.html was failed on 21 Sep, when exactly I started work on this bug. Means encoding inheritance didn't work correctly at that stage so that child frame, without charset definition, couldn't use parent's one. Interestingly, today I've found that case is passed. But, no test case has been added in LayoutTests/fast/encoding. Therefore, if you would think it would be okay, then I will just make two test cases in this bug and continue work what I intended in another patch.
Note You need to log in before you can comment on or make changes to this bug.