Bug 115513

Summary: Improving PageSerializer.cpp to retrieve the subresources specified in @media rules in css
Product: WebKit Reporter: Santosh Mahto <santosh.mahto>
Component: WebCore Misc.Assignee: Nobody <webkit-unassigned>
Status: UNCONFIRMED    
Severity: Normal CC: benjamin, buildbot, commit-queue, darin, eflews.bot, gyuyoung.kim, gyuyoung.kim, kling, rakuco, rego+ews, rniwa, santosh.mahto, vivekg, xan.lopez
Priority: P3    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Attachments:
Description Flags
Patch
none
Patch
none
Patch
none
Test.html and respective saved file in mhtml Test.mhtml from chrome (one Image is not saved in Test.mhtml file)
none
Test Case Result in Chromium with and without patch
none
Test.html and respective saved file in mhtml Test.mhtml from chrome (one Image is not saved in Test.mhtml file)
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch benjamin: review-

Santosh Mahto
Reported 2013-05-02 05:13:05 PDT
The job of PageSerializer is to collect the all the resources(MainResource + subresource) and provide the resources to MHTMLArchive for mhtml page saving. But Currently the PageSerializer code doesnot collect the subresources specified in CSS stylesheet under @media rule So pageSerializer code doesnot collect the subresources(background image) specified in below CSS code <style> ----------------------------------------------------------------- @media screen and (-webkit-min-device-pixel-ratio:1.5){.imsv{background-image:url(http://static.naver.net/www/u/2013/0425/mma_143839223.jpg);-webkit-background-size:216px 613px;}} ------------------------------------------------ </style> As all kind of subresources specified under @media rule are not collected the mhtml page saving misses the data and refetch the item from netwok. Actual behaviour : PageSerializer should collect subresources data specified under @media rule also
Attachments
Patch (2.08 KB, patch)
2013-05-02 05:23 PDT, Santosh Mahto
no flags
Patch (2.10 KB, patch)
2013-05-02 19:22 PDT, Santosh Mahto
no flags
Patch (26.01 KB, patch)
2013-05-03 02:22 PDT, Santosh Mahto
no flags
Test.html and respective saved file in mhtml Test.mhtml from chrome (one Image is not saved in Test.mhtml file) (18.63 KB, text/html)
2013-05-03 02:28 PDT, Santosh Mahto
no flags
Test Case Result in Chromium with and without patch (55.87 KB, text/html)
2013-05-03 02:59 PDT, Santosh Mahto
no flags
Test.html and respective saved file in mhtml Test.mhtml from chrome (one Image is not saved in Test.mhtml file) (55.87 KB, application/x-zip-compressed)
2013-05-03 03:02 PDT, Santosh Mahto
no flags
Patch (3.31 KB, patch)
2013-05-03 03:19 PDT, Santosh Mahto
no flags
Patch (2.39 KB, patch)
2013-05-03 03:24 PDT, Santosh Mahto
no flags
Patch (2.15 KB, patch)
2013-05-04 00:57 PDT, Santosh Mahto
no flags
Patch (2.25 KB, patch)
2013-05-04 01:21 PDT, Santosh Mahto
no flags
Patch (2.30 KB, patch)
2013-05-05 19:00 PDT, Santosh Mahto
no flags
Patch (2.60 KB, patch)
2013-05-06 01:20 PDT, Santosh Mahto
no flags
Patch (2.60 KB, patch)
2013-05-06 06:57 PDT, Santosh Mahto
no flags
Patch (2.42 KB, patch)
2013-05-07 05:41 PDT, Santosh Mahto
no flags
Patch (9.42 KB, patch)
2013-05-13 06:32 PDT, Santosh Mahto
no flags
Patch (9.37 KB, patch)
2013-05-13 19:17 PDT, Santosh Mahto
benjamin: review-
Santosh Mahto
Comment 1 2013-05-02 05:23:48 PDT
EFL EWS Bot
Comment 2 2013-05-02 05:28:13 PDT
EFL EWS Bot
Comment 3 2013-05-02 05:29:14 PDT
Early Warning System Bot
Comment 4 2013-05-02 05:30:27 PDT
Early Warning System Bot
Comment 5 2013-05-02 05:33:20 PDT
Build Bot
Comment 6 2013-05-02 05:43:36 PDT
Build Bot
Comment 7 2013-05-02 05:50:43 PDT
Build Bot
Comment 8 2013-05-02 06:00:58 PDT
Darin Adler
Comment 9 2013-05-02 09:40:32 PDT
Comment on attachment 200308 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=200308&action=review Thanks for tackling this! > Source/WebCore/ChangeLog:8 > + No new tests (OOPS!). We need to add a test demonstrating that this did not work before and now does work after the fact. Often, the hardest part of making a good bug fix is making the test. We can’t check in a change with this “OOPS” line in the patch. The change log either needs to include a test (preferred) or to explain why there is no test. Not contain the OOPS. > Source/WebCore/page/PageSerializer.cpp:282 > + for (int i = 0; i < mediaRule->length(); ++i) { This won’t compile because it’s comparing an int with an unsigned. We’ll need to correct the type to make it compile. > Source/WebCore/page/PageSerializer.cpp:284 > + if (mediaRule->item(i)->isStyleRule()) > + retrieveResourcesForRule(static_cast<CSSStyleRule*>(mediaRule->item(i))->styleRule(), document); This is a slow way to iterate the style rules. It’s doing multiple function calls and levels of virtual dispatch each time we call item(i). At the very least we need to put medaRule->item(i) into a local variable, but it would be even better to eventually make this code use our internal style data structures rather than the DOM wrappers.
Santosh Mahto
Comment 10 2013-05-02 19:22:28 PDT
Santosh Mahto
Comment 11 2013-05-02 19:41:41 PDT
@ Darin Adler Thanks For reviewing I will take care of your suggestion. I need more clarification on point: "but it would be even better to eventually make this code use our internal style data structures rather than the DOM wrappers." As I see the current PageSerializing code relies on DOM wrappers. Do you mean RenderStyles object by "internal style data structure "
Santosh Mahto
Comment 12 2013-05-03 02:22:05 PDT
WebKit Commit Bot
Comment 13 2013-05-03 02:23:29 PDT
Attachment 200395 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'ChangeLog', u'Test.html', u'Test.mhtml']" exit_code: 1 Test.html:1: One or more unexpected \r (^M) found; better to use only a \n [whitespace/carriage_return] [1] Suppressing further [whitespace/carriage_return] reports for this file. Test.html:9: Line contains tab character. [whitespace/tab] [5] Test.html:10: Line contains tab character. [whitespace/tab] [5] Test.html:11: Line contains tab character. [whitespace/tab] [5] Test.html:15: Line contains tab character. [whitespace/tab] [5] Test.html:16: Line contains tab character. [whitespace/tab] [5] Test.html:17: Line contains tab character. [whitespace/tab] [5] Total errors found: 43 in 3 files If any of these errors are false positives, please file a bug against check-webkit-style.
Santosh Mahto
Comment 14 2013-05-03 02:28:52 PDT
Created attachment 200396 [details] Test.html and respective saved file in mhtml Test.mhtml from chrome (one Image is not saved in Test.mhtml file) Test.html -> Test Page Test.mhtml --> saved file in MHTML format in chrome Issue: Image specfied under @media rule is missed
Santosh Mahto
Comment 15 2013-05-03 02:59:30 PDT
Created attachment 200399 [details] Test Case Result in Chromium with and without patch The Test case for Current ISSUE(CHROMIUM): Test.html : test page(contains two image) Test.mhtml : page saved in mhtml format without patch --> only content of one image is present Test_withpatch.mhtml : page saved in mhtml format with my patch --> content of BOTH image is present Observation: The Image specified under @media css rule is missed completely while collecting the subresources in pageSerializer.cpp. Since issue is related to saved mhtml file so it can be seen only in ports where mhtml file saving is enabled And also only chromium has interface to trigger the save page in mhtml,so I could get test verification with my patch and without in chromium port.
Santosh Mahto
Comment 16 2013-05-03 03:02:16 PDT
Created attachment 200401 [details] Test.html and respective saved file in mhtml Test.mhtml from chrome (one Image is not saved in Test.mhtml file) Test Case Result in Chromium with and without patch The Test case for Current ISSUE(CHROMIUM): Test.html : test page(contains two image) Test.mhtml : page saved in mhtml format without patch --> only content of one image is present Test_withpatch.mhtml : page saved in mhtml format with my patch --> content of BOTH image is present Observation: The Image specified under @media css rule is missed completely while collecting the subresources in pageSerializer.cpp. Since issue is related to saved mhtml file so it can be seen only in ports where mhtml file saving is enabled And also only chromium has interface to trigger the save page in mhtml,so I could get test verification with my patch and without in chromium port.
Santosh Mahto
Comment 17 2013-05-03 03:19:32 PDT
WebKit Commit Bot
Comment 18 2013-05-03 03:20:34 PDT
Attachment 200403 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCore/page/PageSerializer.cpp', u'Test.html']" exit_code: 1 Test.html:1: One or more unexpected \r (^M) found; better to use only a \n [whitespace/carriage_return] [1] Suppressing further [whitespace/carriage_return] reports for this file. Test.html:9: Line contains tab character. [whitespace/tab] [5] Test.html:10: Line contains tab character. [whitespace/tab] [5] Test.html:11: Line contains tab character. [whitespace/tab] [5] Test.html:15: Line contains tab character. [whitespace/tab] [5] Test.html:16: Line contains tab character. [whitespace/tab] [5] Test.html:17: Line contains tab character. [whitespace/tab] [5] Source/WebCore/ChangeLog:8: Line contains tab character. [whitespace/tab] [5] Source/WebCore/ChangeLog:9: Line contains tab character. [whitespace/tab] [5] Source/WebCore/ChangeLog:10: Line contains tab character. [whitespace/tab] [5] Source/WebCore/ChangeLog:12: Line contains tab character. [whitespace/tab] [5] Total errors found: 47 in 3 files If any of these errors are false positives, please file a bug against check-webkit-style.
Santosh Mahto
Comment 19 2013-05-03 03:24:32 PDT
Andreas Kling
Comment 20 2013-05-03 09:59:31 PDT
(In reply to comment #9) > (From update of attachment 200308 [details]) > > Source/WebCore/page/PageSerializer.cpp:284 > > + if (mediaRule->item(i)->isStyleRule()) > > + retrieveResourcesForRule(static_cast<CSSStyleRule*>(mediaRule->item(i))->styleRule(), document); > > This is a slow way to iterate the style rules. It’s doing multiple function calls and levels of virtual dispatch each time we call item(i). At the very least we need to put medaRule->item(i) into a local variable, but it would be even better to eventually make this code use our internal style data structures rather than the DOM wrappers. Right. It would be better if PageSerializer accessed the internal rules through styleSheet->contents()->ruleAt(index). This code currently causes instantiation of CSSOM wrappers for everything in the page's stylesheets.
Santosh Mahto
Comment 21 2013-05-04 00:57:01 PDT
Santosh Mahto
Comment 22 2013-05-04 01:04:19 PDT
Hi Andreas Kling & Darin Adler I have updated the code as per yours comments to use internal styesheet content to get style data. Please put your further views on new code. (In reply to comment #20) > (In reply to comment #9) > > (From update of attachment 200308 [details] [details]) > > > Source/WebCore/page/PageSerializer.cpp:284 > > > + if (mediaRule->item(i)->isStyleRule()) > > > + retrieveResourcesForRule(static_cast<CSSStyleRule*>(mediaRule->item(i))->styleRule(), document); > > > > This is a slow way to iterate the style rules. It’s doing multiple function calls and levels of virtual dispatch each time we call item(i). At the very least we need to put medaRule->item(i) into a local variable, but it would be even better to eventually make this code use our internal style data structures rather than the DOM wrappers. > > Right. It would be better if PageSerializer accessed the internal rules through styleSheet->contents()->ruleAt(index). This code currently causes instantiation of CSSOM wrappers for everything in the page's stylesheets.
Santosh Mahto
Comment 23 2013-05-04 01:21:09 PDT
Benjamin Poulain
Comment 24 2013-05-04 17:59:45 PDT
Comment on attachment 200522 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=200522&action=review > Source/WebCore/ChangeLog:11 > + Test file : Test.html (saving this file in mhtml format) You forgot to include the test in this patch. > Source/WebCore/page/PageSerializer.cpp:281 > + WTF::Vector<RefPtr<StyleRuleBase> > rules = mediaStyle->childRules(); Remove the WTF:: > Source/WebCore/page/PageSerializer.cpp:284 > + unsigned ruleSize = rules.size(); > + for (unsigned i = 0; i < ruleSize; ++i) { unsigned -> size_t. > Source/WebCore/page/PageSerializer.cpp:286 > + if (rules[i]->isStyleRule()) > + retrieveResourcesForRule(static_cast<StyleRule*>(rules[i].get()), document); Because operator[] is a checked access, you shoudl store rules[i] in a temporary variable: RefPtr<StyleRuleBase> rule = rules[i]; if (rule->etc
Santosh Mahto
Comment 25 2013-05-04 22:54:11 PDT
(In reply to comment #24) > (From update of attachment 200522 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=200522&action=review > > > Source/WebCore/ChangeLog:11 > > + Test file : Test.html (saving this file in mhtml format) > > You forgot to include the test in this patch. I had attached the test patch before with name Test.html and respective saved file in mhtml Test.mhtml from chromium (one Image is not saved in Test.mhtml file)). Isn't is enough ?? Should I attach agin same Test.html again. Actually I am not very used to bugzilla (this is my first time), so plz guide. I will work on rest of comment.
Santosh Mahto
Comment 26 2013-05-05 19:00:16 PDT
Benjamin Poulain
Comment 27 2013-05-05 19:37:55 PDT
Comment on attachment 200601 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=200601&action=review > I had attached the test patch before with name Test.html and respective saved file in mhtml Test.mhtml from chromium (one Image is not saved in Test.mhtml file)). Isn't is enough ?? > Should I attach agin same Test.html again. > Actually I am not very used to bugzilla (this is my first time), so plz guide. Manual test attached on bugzilla are irrelevant for a patch. You need to have an automated test with your patch. Typically tests are written as LayoutTest or API tests. I am unfamiliar with the testing of PageSerializer, you will probably find how it is tested by looking at the previous changes on that class. > Source/WebCore/page/PageSerializer.cpp:285 > + RefPtr<StyleRuleBase> rule = rules[i]; This should be a reference to avoid the ref(). (or alternatively get the pointer in the temporary).
Santosh Mahto
Comment 28 2013-05-06 01:20:27 PDT
Santosh Mahto
Comment 29 2013-05-06 01:38:19 PDT
(In reply to comment #27) > (From update of attachment 200601 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=200601&action=review > > > I had attached the test patch before with name Test.html and respective saved file in mhtml Test.mhtml from chromium (one Image is not saved in Test.mhtml file)). Isn't is enough ?? > > Should I attach agin same Test.html again. > > Actually I am not very used to bugzilla (this is my first time), so plz guide. > > Manual test attached on bugzilla are irrelevant for a patch. You need to have an automated test with your patch. > > Typically tests are written as LayoutTest or API tests. I am unfamiliar with the testing of PageSerializer, you will probably find how it is tested by looking at the previous changes on that class. This class is triggred when page save in mhtml format is triggred by UI e.g save page as in chromium (when save as MHTMl flag is true) and add to reading list is done in safari(I checked ios5). As this code is triggred by UI interface, layout test case is not possible. On checking previus chekin I found that testcases were under chromium folder. and all the checkin reffered to that unit test as Tests: unit-tests in chromium/tests/WebPageNewSerializerTest.cpp So I think currently the Test file for this patch is not posible(your views r welcome) But definitely we can verified about patch with already test case and result in this bug. > > Source/WebCore/page/PageSerializer.cpp:285 > > + RefPtr<StyleRuleBase> rule = rules[i]; > > This should be a reference to avoid the ref(). (or alternatively get the pointer in the temporary). Done .....
Andreas Kling
Comment 30 2013-05-06 06:39:52 PDT
Comment on attachment 200642 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=200642&action=review > Source/WebCore/ChangeLog:17 > + Test file : This code is triggred by external UI interface to save > + page as mhtml format(chromium save page in mhtml and safari add to > + Reading list).So The automated test case for this code cant be > + produced. > + for Test case and Patch results(with and without) in chromium can be seen at > + attachement at https://bugs.webkit.org/show_bug.cgi?id=115513 Wait, so you're saying that this code is currently unreachable? > Source/WebCore/page/PageSerializer.cpp:281 > + Vector<RefPtr<StyleRuleBase> > rules = mediaStyle->childRules(); You can avoid reffing and dereffing every StyleRuleBase here by using a const reference: const Vector<RefPtr<StyleRuleBase> >& rules = mediaStyle->childRules();
Santosh Mahto
Comment 31 2013-05-06 06:50:07 PDT
(In reply to comment #30) > (From update of attachment 200642 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=200642&action=review > > > Source/WebCore/ChangeLog:17 > > + Test file : This code is triggred by external UI interface to save > > + page as mhtml format(chromium save page in mhtml and safari add to > > + Reading list).So The automated test case for this code cant be > > + produced. > > + for Test case and Patch results(with and without) in chromium can be seen at > > + attachement at https://bugs.webkit.org/show_bug.cgi?id=115513 > > Wait, so you're saying that this code is currently unreachable? No.. The Complete PageSerializer class code is reachable when save page as mhtml is triggered. this class is used to collect the resources and main page. and Triggered from WebPage::getContentsAsMHTMLData(uint64_t callbackID, bool useBinaryEncoding). IOS5 safari add to reading list completely dependent on it Actually I dont know how the test this apis > > Source/WebCore/page/PageSerializer.cpp:281 > > + Vector<RefPtr<StyleRuleBase> > rules = mediaStyle->childRules(); > > You can avoid reffing and dereffing every StyleRuleBase here by using a const reference: > const Vector<RefPtr<StyleRuleBase> >& rules = mediaStyle->childRules();
Santosh Mahto
Comment 32 2013-05-06 06:57:33 PDT
Darin Adler
Comment 33 2013-05-06 09:10:45 PDT
Comment on attachment 200668 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=200668&action=review > Source/WebCore/page/PageSerializer.cpp:280 > + StyleRuleMedia* mediaStyle = static_cast<StyleRuleMedia*>(styleSheet->contents()->ruleAt(i)); I think this might read better without this local variable. > Source/WebCore/page/PageSerializer.cpp:283 > + size_t ruleSize = rules.size(); I think “rule size” is a strange name for this, since it’s not the size of a rule. I suggest either rulesSize or just size for this local. > Source/WebCore/page/PageSerializer.cpp:285 > + const RefPtr<StyleRuleBase>& rule = rules[i]; This is OK, but this is better: StyleRuleBase* rule = rules[i].get(); Since we have to do the get() on the next line anyway, it’s better to do it right away to avoid const RefPtr&.
Darin Adler
Comment 34 2013-05-06 09:11:35 PDT
Comment on attachment 200668 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=200668&action=review > Source/WebCore/ChangeLog:15 > + Test file : This code is triggred by external UI interface to save > + page as mhtml format(chromium save page in mhtml and safari add to > + Reading list).So The automated test case for this code cant be > + produced. We do have tests for serialization; I think it may be wrong to say a test for this can’t be done in WebKit.
Santosh Mahto
Comment 35 2013-05-07 04:57:03 PDT
(In reply to comment #34) > (From update of attachment 200668 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=200668&action=review > > > Source/WebCore/ChangeLog:15 > > + Test file : This code is triggred by external UI interface to save > > + page as mhtml format(chromium save page in mhtml and safari add to > > + Reading list).So The automated test case for this code cant be > > + produced. > > We do have tests for serialization; I think it may be wrong to say a test for this can’t be done in WebKit. http://trac.webkit.org/changeset/92883/trunk/Source/WebCore/page/PageSerializer.cpp In this changelist of PageSerializer.cpp the comments made by Benjamin Poulain regarding tests are as: ----------- No new tests because the test infrastructure for this does not exist with layout tests and the Chromium unit test for the serializer are disabled. page/PageSerializer.cpp: ------------------------- So I am still not sure. If you have any positive info about pasgeserializer testing please share. else I could think of adding one.
Santosh Mahto
Comment 36 2013-05-07 05:41:46 PDT
WebKit Commit Bot
Comment 37 2013-05-07 05:43:17 PDT
Attachment 200895 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCore/page/PageSerializer.cpp']" exit_code: 1 Source/WebCore/ChangeLog:13: Line contains tab character. [whitespace/tab] [5] Total errors found: 1 in 2 files If any of these errors are false positives, please file a bug against check-webkit-style.
Santosh Mahto
Comment 38 2013-05-13 06:32:24 PDT
WebKit Commit Bot
Comment 39 2013-05-13 06:34:28 PDT
Attachment 201554 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCore/page/PageSerializer.cpp', u'Tools/ChangeLog', u'Tools/TestWebKitAPI/PlatformEfl.cmake', u'Tools/TestWebKitAPI/Tests/WebKit2/MhtmlPageContent.cpp', u'Tools/TestWebKitAPI/Tests/WebKit2/bg.png', u'Tools/TestWebKitAPI/Tests/WebKit2/mhtml-content.html']" exit_code: 1 Source/WebCore/ChangeLog:6: Line contains tab character. [whitespace/tab] [5] Total errors found: 1 in 7 files If any of these errors are false positives, please file a bug against check-webkit-style.
Santosh Mahto
Comment 40 2013-05-13 07:46:48 PDT
I have added on WK2 unit test file for API "WKPageGetContentsAsMHTMLData" for all the ports.(i added it under efl port) MhtmlPageConent.cpp has two test case: one for simple page and second related to this bug. The port which has ENABLE_MHTML = ON and uses WK2 need to add this file to respective build files. This Test file covers the test case for this bug also.
Santosh Mahto
Comment 41 2013-05-13 19:17:39 PDT
Benjamin Poulain
Comment 42 2013-05-13 19:52:26 PDT
Comment on attachment 201666 [details] Patch I am afraid API test is not the right tool for the job. API tests are good to test APIs like WKPageGetContentsAsMHTMLData, but they are not a tool to verify content feature. Especially one like mthml where you need to test a lot of variations.
Note You need to log in before you can comment on or make changes to this bug.