RESOLVED FIXED 55859
[Chromium] The method that retrieves all resources from a page should be moved from Chromium to WebKit
https://bugs.webkit.org/show_bug.cgi?id=55859
Summary [Chromium] The method that retrieves all resources from a page should be move...
Jay Civelli
Reported 2011-03-06 23:46:15 PST
In the Chromium WebKit glue code, dom_operations.[h, cc], a method called GetAllSavableResourceLinksForCurrentPage is used to retrieve all the resources from a page. It should be moved to WebKit. One of the key reasons is that we want that method to also report CSS resources, and keeping it in Chromium would required to expose unnecessarily the CSS info through the Chromium WebKit API.
Attachments
Initial patch (24.73 KB, patch)
2011-03-06 23:59 PST, Jay Civelli
levin: review-
Fix build and style issue (25.26 KB, patch)
2011-03-07 23:22 PST, Jay Civelli
levin: review-
Addressing David's remarks. (25.17 KB, patch)
2011-03-15 17:02 PDT, Jay Civelli
no flags
Fixed style. (25.28 KB, patch)
2011-03-15 17:20 PDT, Jay Civelli
abarth: review-
Addressed Adam's comments (27.51 KB, patch)
2011-03-16 13:54 PDT, Jay Civelli
no flags
Fixed style... (27.50 KB, patch)
2011-03-16 13:59 PDT, Jay Civelli
levin: review-
Addressed David's comments (28.97 KB, patch)
2011-03-16 15:24 PDT, Jay Civelli
no flags
Patch (29.78 KB, patch)
2011-03-22 10:04 PDT, Jay Civelli
no flags
One more attempt at landing this. (27.75 KB, patch)
2011-03-23 16:21 PDT, Jay Civelli
no flags
Jay Civelli
Comment 1 2011-03-06 23:59:40 PST
Created attachment 84924 [details] Initial patch
WebKit Review Bot
Comment 2 2011-03-07 00:04:41 PST
Attachment 84924 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebKit/chromium/ChangeLog', u'Sourc..." exit_code: 1 Last 3072 characters of output: izerTest.cpp:34: Alphabetical sorting problem. [build/include_order] [4] Source/WebKit/chromium/tests/WebPageSerializerTest.cpp:36: Found header this file implements after other header. Should be: config.h, primary header, blank line, and then alphabetically sorted. [build/include_order] [4] Source/WebKit/chromium/tests/WebPageSerializerTest.cpp:62: Should only a single space after a punctuation in a comment. [whitespace/comments] [5] Source/WebKit/chromium/tests/data/pageserialization/simple_iframe.html:1: One or more unexpected \r (^M) found; better to use only a \n [whitespace/carriage_return] [1] Suppressing further [whitespace/carriage_return] reports for this file. Source/WebKit/chromium/public/WebPageSerializer.h:77: The parameter name "view" adds no information, so it should be removed. [readability/parameter_name] [5] Source/WebKit/chromium/src/WebPageSerializer.cpp:62: Use 0 instead of NULL. [readability/null] [5] Source/WebKit/chromium/src/WebPageSerializer.cpp:69: Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side. [whitespace/operators] [4] Source/WebKit/chromium/src/WebPageSerializer.cpp:70: Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side. [whitespace/operators] [4] Source/WebKit/chromium/src/WebPageSerializer.cpp:71: Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side. [whitespace/operators] [4] Source/WebKit/chromium/src/WebPageSerializer.cpp:74: Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side. [whitespace/operators] [4] Source/WebKit/chromium/src/WebPageSerializer.cpp:75: Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side. [whitespace/operators] [4] Source/WebKit/chromium/src/WebPageSerializer.cpp:76: Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side. [whitespace/operators] [4] Source/WebKit/chromium/src/WebPageSerializer.cpp:106: Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side. [whitespace/operators] [4] Source/WebKit/chromium/src/WebPageSerializer.cpp:224: One line control clauses should not use braces. [whitespace/braces] [4] Source/WebKit/chromium/tests/data/pageserialization/simple_page.html:1: One or more unexpected \r (^M) found; better to use only a \n [whitespace/carriage_return] [1] Suppressing further [whitespace/carriage_return] reports for this file. Source/WebKit/chromium/tests/data/pageserialization/top_frame.html:1: One or more unexpected \r (^M) found; better to use only a \n [whitespace/carriage_return] [1] Suppressing further [whitespace/carriage_return] reports for this file. Total errors found: 266 in 11 files If any of these errors are false positives, please file a bug against check-webkit-style.
WebKit Review Bot
Comment 3 2011-03-07 00:10:26 PST
WebKit Review Bot
Comment 4 2011-03-07 01:03:00 PST
Dimitri Glazkov (Google)
Comment 5 2011-03-07 07:08:45 PST
Build breakage?
David Levin
Comment 6 2011-03-07 10:48:11 PST
Comment on attachment 84924 [details] Initial patch r- I may not be the one to r+ but it seems clear that this currently should get an r- due to lots of style issues and it broke the build (and the copyright years say 2010 instead of 2011).
Jay Civelli
Comment 7 2011-03-07 23:22:28 PST
Created attachment 85026 [details] Fix build and style issue
David Levin
Comment 8 2011-03-15 10:33:31 PDT
Comment on attachment 85026 [details] Fix build and style issue View in context: https://bugs.webkit.org/attachment.cgi?id=85026&action=review Just a few things to address and then this should be fine. > Source/WebKit/chromium/src/WebDataSourceImpl.cpp:38 > +#include <googleurl/src/gurl.h> Why do you need to include this? > Source/WebKit/chromium/src/WebPageSerializer.cpp:34 > +#include "DocumentLoader.h" Where is the document loaded used in here? > Source/WebKit/chromium/src/WebPageSerializer.cpp:63 > + if (element.hasTagName(HTMLNames::imgTag) || element.hasTagName(HTMLNames::scriptTag)) { Single line clause shouldn't have surrounding {} > Source/WebKit/chromium/src/WebPageSerializer.cpp:82 > + // TODO(jnd): Add support for extracting links of sub-resources which s/TODO({^)]*)/FIXME/ > Source/WebKit/chromium/src/WebPageSerializer.cpp:84 > + // See bug: http://b/issue?id=1111667. Internal bug reference doesn't seem appropriate. > Source/WebKit/chromium/src/WebPageSerializer.cpp:109 > + if (visitedFrames->find(frame) == notFound) While find is more flexible, I wonder if there should be a contains method as well which just does this but would be more readable in these pretty common cases. Reading more of this code, I like this idea more and more. > Source/WebKit/chromium/src/WebPageSerializer.cpp:120 > + if (!url.protocolIs("http") && !url.protocolIs("https") && !url.protocolIs("file")) url.protocolInHTTPFamily url.protocolIsLocalFile > Source/WebKit/chromium/src/WebPageSerializer.cpp:163 > + if (node->isElementNode()) { if (!node->isElementNode()) continue; (Less indenting and "fail fast" approach.) > Source/WebKit/chromium/tests/WebPageSerializerTest.cpp:55 > + WebPageSerializerTest() : m_webView(0), m_supportedSchemes(3U) 3U? Do we need the U? > Source/WebKit/chromium/tests/WebPageSerializerTest.cpp:59 > + m_supportedSchemes[1] = "file"; Should this be [2] ? This seems to indicate a hole in the test coverage. Should it have https resources in it? What about file resources? What about unsupported schemes?
Jay Civelli
Comment 9 2011-03-15 17:02:57 PDT
Created attachment 85879 [details] Addressing David's remarks.
Jay Civelli
Comment 10 2011-03-15 17:07:33 PDT
(In reply to comment #8) > (From update of attachment 85026 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=85026&action=review > > Just a few things to address and then this should be fine. > > > Source/WebKit/chromium/src/WebDataSourceImpl.cpp:38 > > +#include <googleurl/src/gurl.h> > > Why do you need to include this? Removed. > > Source/WebKit/chromium/src/WebPageSerializer.cpp:34 > > +#include "DocumentLoader.h" > > Where is the document loaded used in here? At line 134: KURL frameURL = frame->loader()->documentLoader()->request().url(); > > Source/WebKit/chromium/src/WebPageSerializer.cpp:63 > > + if (element.hasTagName(HTMLNames::imgTag) || element.hasTagName(HTMLNames::scriptTag)) { > > Single line clause shouldn't have surrounding {} I thought when following else statement had {} then we should use them on the "then" clause as well? > > Source/WebKit/chromium/src/WebPageSerializer.cpp:82 > > + // TODO(jnd): Add support for extracting links of sub-resources which > > s/TODO({^)]*)/FIXME/ Done. > > Source/WebKit/chromium/src/WebPageSerializer.cpp:84 > > + // See bug: http://b/issue?id=1111667. > > Internal bug reference doesn't seem appropriate. Removed. > > Source/WebKit/chromium/src/WebPageSerializer.cpp:109 > > + if (visitedFrames->find(frame) == notFound) > > While find is more flexible, I wonder if there should be a contains method as well which just does this but would be more readable in these pretty common cases. Here I am using a Vector, not a WebVector, and it does not have a contain. Actually I now realize I am using the WebVector::find() in the test only, and I would probably only need a contains method. So I replaced find() with contains(). > Reading more of this code, I like this idea more and more. > > > Source/WebKit/chromium/src/WebPageSerializer.cpp:120 > > + if (!url.protocolIs("http") && !url.protocolIs("https") && !url.protocolIs("file")) > > url.protocolInHTTPFamily > url.protocolIsLocalFile Done > > Source/WebKit/chromium/src/WebPageSerializer.cpp:163 > > + if (node->isElementNode()) { > > if (!node->isElementNode()) > continue; > > (Less indenting and "fail fast" approach.) OK. > > Source/WebKit/chromium/tests/WebPageSerializerTest.cpp:55 > > + WebPageSerializerTest() : m_webView(0), m_supportedSchemes(3U) > > 3U? Do we need the U? Yes, the fails to compile otherwise. (it's using the wrong constructor in that case). > > Source/WebKit/chromium/tests/WebPageSerializerTest.cpp:59 > > + m_supportedSchemes[1] = "file"; > > Should this be [2] ? Arg! Yes. > This seems to indicate a hole in the test coverage. Should it have https resources in it? What about file resources? What about unsupported schemes? Good idea, added.
WebKit Review Bot
Comment 11 2011-03-15 17:10:33 PDT
Attachment 85879 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebKit/chromium/ChangeLog', u'Sourc..." exit_code: 1 Source/WebKit/chromium/public/WebVector.h:129: One or more unexpected \r (^M) found; better to use only a \n [whitespace/carriage_return] [1] Suppressing further [whitespace/carriage_return] reports for this file. Total errors found: 7 in 13 files If any of these errors are false positives, please file a bug against check-webkit-style.
Jay Civelli
Comment 12 2011-03-15 17:20:12 PDT
Created attachment 85884 [details] Fixed style.
Adam Barth
Comment 13 2011-03-15 17:33:45 PDT
Comment on attachment 85884 [details] Fixed style. View in context: https://bugs.webkit.org/attachment.cgi?id=85884&action=review WebPageSerializer.cpp is such great sadness. I re-wrote the whole thing, but never landed the new version. :( Almost every line of code in that file is wrong. > Source/WebKit/chromium/src/WebPageSerializer.cpp:60 > +KURL getSubResourceURLFromElement(const Element& element) I usually pass Elements by pointer, not by reference. > Source/WebKit/chromium/src/WebPageSerializer.cpp:92 > + if (value.isNull() || value.isEmpty() || value.startsWith("javascript:", false)) value.startsWith("javascript:", false) is wrong. It could start with " javascript:" also, for example. > Source/WebKit/chromium/src/WebPageSerializer.cpp:106 > + if ((element.hasTagName(HTMLNames::iframeTag) || element.hasTagName(HTMLNames::frameTag)) > + && element.isFrameOwnerElement()) { What if it's an object or an embed element that has a frame? > Source/WebKit/chromium/src/WebPageSerializer.cpp:114 > + if (url.isEmpty() || url.isNull() || !url.isValid()) isEmpty includes isNull, right?
Jay Civelli
Comment 14 2011-03-16 13:53:34 PDT
(In reply to comment #13) > (From update of attachment 85884 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=85884&action=review > > WebPageSerializer.cpp is such great sadness. I re-wrote the whole thing, but never landed the new version. :( > > Almost every line of code in that file is wrong. And it's only going to get worse, as I am planning to add some more code in that file :-) I looked at the other file briefly (WebPageSerializerImpl.cpp) and was also concerned. Any reason why you did not land your new version? Anything I can do to help? > > Source/WebKit/chromium/src/WebPageSerializer.cpp:60 > > +KURL getSubResourceURLFromElement(const Element& element) > > I usually pass Elements by pointer, not by reference. OK > > Source/WebKit/chromium/src/WebPageSerializer.cpp:92 > > + if (value.isNull() || value.isEmpty() || value.startsWith("javascript:", false)) > > value.startsWith("javascript:", false) is wrong. It could start with " javascript:" also, for example. OK, now stripping white-space. > > Source/WebKit/chromium/src/WebPageSerializer.cpp:106 > > + if ((element.hasTagName(HTMLNames::iframeTag) || element.hasTagName(HTMLNames::frameTag)) > > + && element.isFrameOwnerElement()) { > > What if it's an object or an embed element that has a frame? Now supported. > > Source/WebKit/chromium/src/WebPageSerializer.cpp:114 > > + if (url.isEmpty() || url.isNull() || !url.isValid()) > > isEmpty includes isNull, right? Oh, yeah.
Jay Civelli
Comment 15 2011-03-16 13:54:21 PDT
Created attachment 85968 [details] Addressed Adam's comments
WebKit Review Bot
Comment 16 2011-03-16 13:56:11 PDT
Attachment 85968 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebKit/chromium/ChangeLog', u'Sourc..." exit_code: 1 Source/WebKit/chromium/tests/data/pageserialization/object_iframe.html:1: One or more unexpected \r (^M) found; better to use only a \n [whitespace/carriage_return] [1] Suppressing further [whitespace/carriage_return] reports for this file. Source/WebKit/chromium/src/WebPageSerializer.cpp:90: One line control clauses should not use braces. [whitespace/braces] [4] Source/WebKit/chromium/tests/data/pageserialization/embed_iframe.html:1: One or more unexpected \r (^M) found; better to use only a \n [whitespace/carriage_return] [1] Suppressing further [whitespace/carriage_return] reports for this file. Total errors found: 11 in 15 files If any of these errors are false positives, please file a bug against check-webkit-style.
Jay Civelli
Comment 17 2011-03-16 13:59:00 PDT
Created attachment 85970 [details] Fixed style...
WebKit Review Bot
Comment 18 2011-03-16 14:01:09 PDT
Attachment 85970 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebKit/chromium/ChangeLog', u'Sourc..." exit_code: 1 Source/WebKit/chromium/src/WebPageSerializer.cpp:90: One line control clauses should not use braces. [whitespace/braces] [4] Total errors found: 1 in 15 files If any of these errors are false positives, please file a bug against check-webkit-style.
David Levin
Comment 19 2011-03-16 14:44:53 PDT
Comment on attachment 85970 [details] Fixed style... View in context: https://bugs.webkit.org/attachment.cgi?id=85970&action=review r- because I want Vector::contains :) > Source/WebKit/chromium/src/WebPageSerializer.cpp:61 > +{ ASSERT(element); > Source/WebKit/chromium/src/WebPageSerializer.cpp:63 > + if (element->hasTagName(HTMLNames::imgTag) || element->hasTagName(HTMLNames::scriptTag)) { Style bot missed this one but it is still true: "One line control clauses should not use braces. [whitespace/braces] [4]" > Source/WebKit/chromium/src/WebPageSerializer.cpp:65 > + } else if (element->hasTagName(HTMLNames::inputTag)) { Leave the braces for this if. > Source/WebKit/chromium/src/WebPageSerializer.cpp:72 > + || element->hasTagName(HTMLNames::tdTag)) { Remove them here. > Source/WebKit/chromium/src/WebPageSerializer.cpp:77 > + || element->hasTagName(HTMLNames::insTag)) { Remove them here. > Source/WebKit/chromium/src/WebPageSerializer.cpp:79 > + } else if (element->hasTagName(HTMLNames::linkTag)) { Leave the braces for this if. > Source/WebKit/chromium/src/WebPageSerializer.cpp:86 > + } else if (element->hasTagName(HTMLNames::objectTag)) { Remove them here. >> Source/WebKit/chromium/src/WebPageSerializer.cpp:90 >> + } > > One line control clauses should not use braces. [whitespace/braces] [4] What Style bot said. > Source/WebKit/chromium/src/WebPageSerializer.cpp:115 > + if (visitedFrames->find(frame) == notFound) May I recommend adding contains to Vector (just implement in terms of find)? > Source/WebKit/chromium/src/WebPageSerializer.cpp:130 > + if (resourceURLs->find(url) == notFound) Use contains. > Source/WebKit/chromium/src/WebPageSerializer.cpp:159 > + if (visitedFrames->find(frame) != notFound) Use contains. > Source/WebKit/chromium/src/WebPageSerializer.cpp:162 > + if (frameURLs->find(frameURL) == notFound) Use contains.
Jay Civelli
Comment 20 2011-03-16 15:24:13 PDT
Created attachment 85987 [details] Addressed David's comments Addressed all comments.
David Levin
Comment 21 2011-03-16 15:28:28 PDT
Comment on attachment 85987 [details] Addressed David's comments View in context: https://bugs.webkit.org/attachment.cgi?id=85987&action=review > Source/JavaScriptCore/ChangeLog:10 > + (WTF::::contains): Consider fixing the to be the right function names in the future but ok.
WebKit Commit Bot
Comment 22 2011-03-17 12:47:02 PDT
The commit-queue encountered the following flaky tests while processing attachment 85987 [details]: http/tests/inspector/network/network-size.html bug 56581 (author: pfeldman@chromium.org) The commit-queue is continuing to process your patch.
WebKit Commit Bot
Comment 23 2011-03-17 12:49:43 PDT
Comment on attachment 85987 [details] Addressed David's comments Clearing flags on attachment: 85987 Committed r81377: <http://trac.webkit.org/changeset/81377>
WebKit Commit Bot
Comment 24 2011-03-17 12:49:50 PDT
All reviewed patches have been landed. Closing bug.
James Simonsen
Comment 25 2011-03-21 13:28:31 PDT
The new unit test from this patch is failing already. This is being rolled out in bug 56765. The failure is: [ RUN ] WebPageSerializerTest.MultipleFrames .\tests\WebPageSerializerTest.cpp(182): error: Value of: resources.size() Actual: 5 Expected: 6 .\tests\WebPageSerializerTest.cpp(186): error: Value of: webVectorContains(resources, "http://www.test.com/music.mid") Actual: false Expected: true [ FAILED ] WebPageSerializerTest.MultipleFrames (16 ms)
David Levin
Comment 26 2011-03-21 13:31:37 PDT
Reopened.
Jay Civelli
Comment 27 2011-03-22 10:04:21 PDT
Jay Civelli
Comment 28 2011-03-22 10:07:36 PDT
Another attempt at landing this patch. I am not entirely sure while the WebPageSerializerTest.MultipleFrames test failed, as it passes in both debug and release mode on my box. There was an unbalanced tag for the resource reported as missing: <embed src="music.mid"></object> that I fixed, which should hopefully fix the problem.
WebKit Commit Bot
Comment 29 2011-03-22 12:18:16 PDT
Comment on attachment 86467 [details] Patch Clearing flags on attachment: 86467 Committed r81686: <http://trac.webkit.org/changeset/81686>
WebKit Commit Bot
Comment 30 2011-03-22 12:18:24 PDT
All reviewed patches have been landed. Closing bug.
James Simonsen
Comment 31 2011-03-22 19:51:45 PDT
This broke the main Chrome Win bot again. I don't know why it only affects that one. The Win WebKit Canary and Deps bots run it just fine.
Jay Civelli
Comment 32 2011-03-23 16:21:20 PDT
Created attachment 86718 [details] One more attempt at landing this. Not sure why the test fails on one of the bot. One of the embed src is not reported. This could be caused by a plugin. Disabling that resource in the test and adding a FIXME: for now.
Jay Civelli
Comment 33 2011-03-23 16:33:56 PDT
Reopening for rerelanding.
WebKit Commit Bot
Comment 34 2011-03-23 19:40:36 PDT
The commit-queue encountered the following flaky tests while processing attachment 86718 [details]: http/tests/loading/bad-server-subframe.html bug 56985 (author: mjs@apple.com) The commit-queue is continuing to process your patch.
WebKit Commit Bot
Comment 35 2011-03-23 19:44:15 PDT
Comment on attachment 86718 [details] One more attempt at landing this. Clearing flags on attachment: 86718 Committed r81846: <http://trac.webkit.org/changeset/81846>
WebKit Commit Bot
Comment 36 2011-03-23 19:44:21 PDT
All reviewed patches have been landed. Closing bug.
Note You need to log in before you can comment on or make changes to this bug.