Bug 55859

Summary: [Chromium] The method that retrieves all resources from a page should be moved from Chromium to WebKit
Product: WebKit Reporter: Jay Civelli <jcivelli>
Component: WebKit Misc.Assignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, commit-queue, dglazkov, levin, simonjam, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: PC   
OS: OS X 10.5   
Bug Depends on: 56765, 56914    
Bug Blocks:    
Attachments:
Description Flags
Initial patch
levin: review-
Fix build and style issue
levin: review-
Addressing David's remarks.
none
Fixed style.
abarth: review-
Addressed Adam's comments
none
Fixed style...
levin: review-
Addressed David's comments
none
Patch
none
One more attempt at landing this. none

Description Jay Civelli 2011-03-06 23:46:15 PST
In the Chromium WebKit glue code, dom_operations.[h, cc], a method called GetAllSavableResourceLinksForCurrentPage is used to retrieve all the resources from a page.
It should be moved to WebKit.
One of the key reasons is that we want that method to also report CSS resources, and keeping it in Chromium would required to expose unnecessarily the CSS info through the Chromium WebKit API.
Comment 1 Jay Civelli 2011-03-06 23:59:40 PST
Created attachment 84924 [details]
Initial patch
Comment 2 WebKit Review Bot 2011-03-07 00:04:41 PST
Attachment 84924 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebKit/chromium/ChangeLog', u'Sourc..." exit_code: 1

Last 3072 characters of output:
izerTest.cpp:34:  Alphabetical sorting problem.  [build/include_order] [4]
Source/WebKit/chromium/tests/WebPageSerializerTest.cpp:36:  Found header this file implements after other header. Should be: config.h, primary header, blank line, and then alphabetically sorted.  [build/include_order] [4]
Source/WebKit/chromium/tests/WebPageSerializerTest.cpp:62:  Should only a single space after a punctuation in a comment.  [whitespace/comments] [5]
Source/WebKit/chromium/tests/data/pageserialization/simple_iframe.html:1:  One or more unexpected \r (^M) found; better to use only a \n  [whitespace/carriage_return] [1]
Suppressing further [whitespace/carriage_return] reports for this file.
Source/WebKit/chromium/public/WebPageSerializer.h:77:  The parameter name "view" adds no information, so it should be removed.  [readability/parameter_name] [5]
Source/WebKit/chromium/src/WebPageSerializer.cpp:62:  Use 0 instead of NULL.  [readability/null] [5]
Source/WebKit/chromium/src/WebPageSerializer.cpp:69:  Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side.  [whitespace/operators] [4]
Source/WebKit/chromium/src/WebPageSerializer.cpp:70:  Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side.  [whitespace/operators] [4]
Source/WebKit/chromium/src/WebPageSerializer.cpp:71:  Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side.  [whitespace/operators] [4]
Source/WebKit/chromium/src/WebPageSerializer.cpp:74:  Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side.  [whitespace/operators] [4]
Source/WebKit/chromium/src/WebPageSerializer.cpp:75:  Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side.  [whitespace/operators] [4]
Source/WebKit/chromium/src/WebPageSerializer.cpp:76:  Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side.  [whitespace/operators] [4]
Source/WebKit/chromium/src/WebPageSerializer.cpp:106:  Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side.  [whitespace/operators] [4]
Source/WebKit/chromium/src/WebPageSerializer.cpp:224:  One line control clauses should not use braces.  [whitespace/braces] [4]
Source/WebKit/chromium/tests/data/pageserialization/simple_page.html:1:  One or more unexpected \r (^M) found; better to use only a \n  [whitespace/carriage_return] [1]
Suppressing further [whitespace/carriage_return] reports for this file.
Source/WebKit/chromium/tests/data/pageserialization/top_frame.html:1:  One or more unexpected \r (^M) found; better to use only a \n  [whitespace/carriage_return] [1]
Suppressing further [whitespace/carriage_return] reports for this file.
Total errors found: 266 in 11 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 3 WebKit Review Bot 2011-03-07 00:10:26 PST
Attachment 84924 [details] did not build on chromium:
Build output: http://queues.webkit.org/results/8102494
Comment 4 WebKit Review Bot 2011-03-07 01:03:00 PST
Attachment 84924 [details] did not build on chromium:
Build output: http://queues.webkit.org/results/8107263
Comment 5 Dimitri Glazkov (Google) 2011-03-07 07:08:45 PST
Build breakage?
Comment 6 David Levin 2011-03-07 10:48:11 PST
Comment on attachment 84924 [details]
Initial patch

r-
I may not be the one to r+ but it seems clear that this currently should get an r- due to lots of style issues and it broke the build (and the copyright years say 2010 instead of 2011).
Comment 7 Jay Civelli 2011-03-07 23:22:28 PST
Created attachment 85026 [details]
Fix build and style issue
Comment 8 David Levin 2011-03-15 10:33:31 PDT
Comment on attachment 85026 [details]
Fix build and style issue

View in context: https://bugs.webkit.org/attachment.cgi?id=85026&action=review

Just a few things to address and then this should be fine.

> Source/WebKit/chromium/src/WebDataSourceImpl.cpp:38
> +#include <googleurl/src/gurl.h>

Why do you need to include this?

> Source/WebKit/chromium/src/WebPageSerializer.cpp:34
> +#include "DocumentLoader.h"

Where is the document loaded used in here?

> Source/WebKit/chromium/src/WebPageSerializer.cpp:63
> +    if (element.hasTagName(HTMLNames::imgTag) || element.hasTagName(HTMLNames::scriptTag)) {

Single line clause shouldn't have surrounding {}

> Source/WebKit/chromium/src/WebPageSerializer.cpp:82
> +            // TODO(jnd): Add support for extracting links of sub-resources which

s/TODO({^)]*)/FIXME/

> Source/WebKit/chromium/src/WebPageSerializer.cpp:84
> +            // See bug: http://b/issue?id=1111667.

Internal bug reference doesn't seem appropriate.

> Source/WebKit/chromium/src/WebPageSerializer.cpp:109
> +        if (visitedFrames->find(frame) == notFound)

While find is more flexible, I wonder if there should be a contains method as well which just does this but would be more readable in these pretty common cases.


Reading more of this code, I like this idea more and more.

> Source/WebKit/chromium/src/WebPageSerializer.cpp:120
> +    if (!url.protocolIs("http") && !url.protocolIs("https") && !url.protocolIs("file"))

url.protocolInHTTPFamily
url.protocolIsLocalFile

> Source/WebKit/chromium/src/WebPageSerializer.cpp:163
> +        if (node->isElementNode()) {

if (!node->isElementNode())
    continue;

(Less indenting and "fail fast" approach.)

> Source/WebKit/chromium/tests/WebPageSerializerTest.cpp:55
> +    WebPageSerializerTest() : m_webView(0), m_supportedSchemes(3U)

3U? Do we need the U?

> Source/WebKit/chromium/tests/WebPageSerializerTest.cpp:59
> +        m_supportedSchemes[1] = "file";

Should this be [2] ?

This seems to indicate a hole in the test coverage.  Should it have https resources in it? What about file resources? What about unsupported schemes?
Comment 9 Jay Civelli 2011-03-15 17:02:57 PDT
Created attachment 85879 [details]
Addressing David's remarks.
Comment 10 Jay Civelli 2011-03-15 17:07:33 PDT
(In reply to comment #8)
> (From update of attachment 85026 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=85026&action=review
> 
> Just a few things to address and then this should be fine.
> 
> > Source/WebKit/chromium/src/WebDataSourceImpl.cpp:38
> > +#include <googleurl/src/gurl.h>
> 
> Why do you need to include this?
Removed.

> > Source/WebKit/chromium/src/WebPageSerializer.cpp:34
> > +#include "DocumentLoader.h"
> 
> Where is the document loaded used in here?
At line 134:
KURL frameURL = frame->loader()->documentLoader()->request().url();


> > Source/WebKit/chromium/src/WebPageSerializer.cpp:63
> > +    if (element.hasTagName(HTMLNames::imgTag) || element.hasTagName(HTMLNames::scriptTag)) {
> 
> Single line clause shouldn't have surrounding {}
I thought when following else statement had {} then we should use them on the "then" clause as well?

> > Source/WebKit/chromium/src/WebPageSerializer.cpp:82
> > +            // TODO(jnd): Add support for extracting links of sub-resources which
> 
> s/TODO({^)]*)/FIXME/
Done.

> > Source/WebKit/chromium/src/WebPageSerializer.cpp:84
> > +            // See bug: http://b/issue?id=1111667.
> 
> Internal bug reference doesn't seem appropriate.
Removed.

> > Source/WebKit/chromium/src/WebPageSerializer.cpp:109
> > +        if (visitedFrames->find(frame) == notFound)
> 
> While find is more flexible, I wonder if there should be a contains method as well which just does this but would be more readable in these pretty common cases.
Here I am using a Vector, not a WebVector, and it does not have a contain.
Actually I now realize I am using the WebVector::find() in the test only, and I would probably only need a contains method.  So I replaced find() with contains().

> Reading more of this code, I like this idea more and more.
> 
> > Source/WebKit/chromium/src/WebPageSerializer.cpp:120
> > +    if (!url.protocolIs("http") && !url.protocolIs("https") && !url.protocolIs("file"))
> 
> url.protocolInHTTPFamily
> url.protocolIsLocalFile
Done

> > Source/WebKit/chromium/src/WebPageSerializer.cpp:163
> > +        if (node->isElementNode()) {
> 
> if (!node->isElementNode())
>     continue;
> 
> (Less indenting and "fail fast" approach.)
OK.

> > Source/WebKit/chromium/tests/WebPageSerializerTest.cpp:55
> > +    WebPageSerializerTest() : m_webView(0), m_supportedSchemes(3U)
> 
> 3U? Do we need the U?
Yes, the fails to compile otherwise. (it's using the wrong constructor in that case).

> > Source/WebKit/chromium/tests/WebPageSerializerTest.cpp:59
> > +        m_supportedSchemes[1] = "file";
> 
> Should this be [2] ?
Arg! Yes.

> This seems to indicate a hole in the test coverage.  Should it have https resources in it? What about file resources? What about unsupported schemes?
Good idea, added.
Comment 11 WebKit Review Bot 2011-03-15 17:10:33 PDT
Attachment 85879 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebKit/chromium/ChangeLog', u'Sourc..." exit_code: 1

Source/WebKit/chromium/public/WebVector.h:129:  One or more unexpected \r (^M) found; better to use only a \n  [whitespace/carriage_return] [1]
Suppressing further [whitespace/carriage_return] reports for this file.
Total errors found: 7 in 13 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 12 Jay Civelli 2011-03-15 17:20:12 PDT
Created attachment 85884 [details]
Fixed style.
Comment 13 Adam Barth 2011-03-15 17:33:45 PDT
Comment on attachment 85884 [details]
Fixed style.

View in context: https://bugs.webkit.org/attachment.cgi?id=85884&action=review

WebPageSerializer.cpp is such great sadness.  I re-wrote the whole thing, but never landed the new version.  :(

Almost every line of code in that file is wrong.

> Source/WebKit/chromium/src/WebPageSerializer.cpp:60
> +KURL getSubResourceURLFromElement(const Element& element)

I usually pass Elements by pointer, not by reference.

> Source/WebKit/chromium/src/WebPageSerializer.cpp:92
> +    if (value.isNull() || value.isEmpty() || value.startsWith("javascript:", false))

value.startsWith("javascript:", false) is wrong.  It could start with "   javascript:" also, for example.

> Source/WebKit/chromium/src/WebPageSerializer.cpp:106
> +    if ((element.hasTagName(HTMLNames::iframeTag) || element.hasTagName(HTMLNames::frameTag))
> +            && element.isFrameOwnerElement()) {

What if it's an object or an embed element that has a frame?

> Source/WebKit/chromium/src/WebPageSerializer.cpp:114
> +    if (url.isEmpty() || url.isNull() || !url.isValid())

isEmpty includes isNull, right?
Comment 14 Jay Civelli 2011-03-16 13:53:34 PDT
(In reply to comment #13)
> (From update of attachment 85884 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=85884&action=review
> 
> WebPageSerializer.cpp is such great sadness.  I re-wrote the whole thing, but never landed the new version.  :(
> 
> Almost every line of code in that file is wrong.
And it's only going to get worse, as I am planning to add some more code in that file :-)
I looked at the other file briefly (WebPageSerializerImpl.cpp) and was also concerned. Any reason why you did not land your new version? Anything I can do to help?

> > Source/WebKit/chromium/src/WebPageSerializer.cpp:60
> > +KURL getSubResourceURLFromElement(const Element& element)
> 
> I usually pass Elements by pointer, not by reference.
OK

> > Source/WebKit/chromium/src/WebPageSerializer.cpp:92
> > +    if (value.isNull() || value.isEmpty() || value.startsWith("javascript:", false))
> 
> value.startsWith("javascript:", false) is wrong.  It could start with "   javascript:" also, for example.
OK, now stripping white-space.

> > Source/WebKit/chromium/src/WebPageSerializer.cpp:106
> > +    if ((element.hasTagName(HTMLNames::iframeTag) || element.hasTagName(HTMLNames::frameTag))
> > +            && element.isFrameOwnerElement()) {
> 
> What if it's an object or an embed element that has a frame?
Now supported.

> > Source/WebKit/chromium/src/WebPageSerializer.cpp:114
> > +    if (url.isEmpty() || url.isNull() || !url.isValid())
> 
> isEmpty includes isNull, right?
Oh, yeah.
Comment 15 Jay Civelli 2011-03-16 13:54:21 PDT
Created attachment 85968 [details]
Addressed Adam's comments
Comment 16 WebKit Review Bot 2011-03-16 13:56:11 PDT
Attachment 85968 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebKit/chromium/ChangeLog', u'Sourc..." exit_code: 1

Source/WebKit/chromium/tests/data/pageserialization/object_iframe.html:1:  One or more unexpected \r (^M) found; better to use only a \n  [whitespace/carriage_return] [1]
Suppressing further [whitespace/carriage_return] reports for this file.
Source/WebKit/chromium/src/WebPageSerializer.cpp:90:  One line control clauses should not use braces.  [whitespace/braces] [4]
Source/WebKit/chromium/tests/data/pageserialization/embed_iframe.html:1:  One or more unexpected \r (^M) found; better to use only a \n  [whitespace/carriage_return] [1]
Suppressing further [whitespace/carriage_return] reports for this file.
Total errors found: 11 in 15 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 17 Jay Civelli 2011-03-16 13:59:00 PDT
Created attachment 85970 [details]
Fixed style...
Comment 18 WebKit Review Bot 2011-03-16 14:01:09 PDT
Attachment 85970 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebKit/chromium/ChangeLog', u'Sourc..." exit_code: 1

Source/WebKit/chromium/src/WebPageSerializer.cpp:90:  One line control clauses should not use braces.  [whitespace/braces] [4]
Total errors found: 1 in 15 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 19 David Levin 2011-03-16 14:44:53 PDT
Comment on attachment 85970 [details]
Fixed style...

View in context: https://bugs.webkit.org/attachment.cgi?id=85970&action=review

r- because I want Vector::contains :)

> Source/WebKit/chromium/src/WebPageSerializer.cpp:61
> +{

ASSERT(element);

> Source/WebKit/chromium/src/WebPageSerializer.cpp:63
> +    if (element->hasTagName(HTMLNames::imgTag) || element->hasTagName(HTMLNames::scriptTag)) {

Style bot missed this one but it is still true: "One line control clauses should not use braces.  [whitespace/braces] [4]"

> Source/WebKit/chromium/src/WebPageSerializer.cpp:65
> +    } else if (element->hasTagName(HTMLNames::inputTag)) {

Leave the braces for this if.

> Source/WebKit/chromium/src/WebPageSerializer.cpp:72
> +               || element->hasTagName(HTMLNames::tdTag)) {

Remove them here.

> Source/WebKit/chromium/src/WebPageSerializer.cpp:77
> +               || element->hasTagName(HTMLNames::insTag)) {

Remove them here.

> Source/WebKit/chromium/src/WebPageSerializer.cpp:79
> +    } else if (element->hasTagName(HTMLNames::linkTag)) {

Leave the braces for this if.

> Source/WebKit/chromium/src/WebPageSerializer.cpp:86
> +    } else if (element->hasTagName(HTMLNames::objectTag)) {

Remove them here.

>> Source/WebKit/chromium/src/WebPageSerializer.cpp:90
>> +    }
> 
> One line control clauses should not use braces.  [whitespace/braces] [4]

What Style bot said.

> Source/WebKit/chromium/src/WebPageSerializer.cpp:115
> +            if (visitedFrames->find(frame) == notFound)

May I recommend adding contains to Vector (just implement in terms of find)?

> Source/WebKit/chromium/src/WebPageSerializer.cpp:130
> +    if (resourceURLs->find(url) == notFound)

Use contains.

> Source/WebKit/chromium/src/WebPageSerializer.cpp:159
> +    if (visitedFrames->find(frame) != notFound)

Use contains.

> Source/WebKit/chromium/src/WebPageSerializer.cpp:162
> +    if (frameURLs->find(frameURL) == notFound)

Use contains.
Comment 20 Jay Civelli 2011-03-16 15:24:13 PDT
Created attachment 85987 [details]
Addressed David's comments

Addressed all comments.
Comment 21 David Levin 2011-03-16 15:28:28 PDT
Comment on attachment 85987 [details]
Addressed David's comments

View in context: https://bugs.webkit.org/attachment.cgi?id=85987&action=review

> Source/JavaScriptCore/ChangeLog:10
> +        (WTF::::contains):

Consider fixing the to be the right function names in the future but ok.
Comment 22 WebKit Commit Bot 2011-03-17 12:47:02 PDT
The commit-queue encountered the following flaky tests while processing attachment 85987 [details]:

http/tests/inspector/network/network-size.html bug 56581 (author: pfeldman@chromium.org)
The commit-queue is continuing to process your patch.
Comment 23 WebKit Commit Bot 2011-03-17 12:49:43 PDT
Comment on attachment 85987 [details]
Addressed David's comments

Clearing flags on attachment: 85987

Committed r81377: <http://trac.webkit.org/changeset/81377>
Comment 24 WebKit Commit Bot 2011-03-17 12:49:50 PDT
All reviewed patches have been landed.  Closing bug.
Comment 25 James Simonsen 2011-03-21 13:28:31 PDT
The new unit test from this patch is failing already. This is being rolled out in bug 56765.

The failure is:

[ RUN      ] WebPageSerializerTest.MultipleFrames
.\tests\WebPageSerializerTest.cpp(182): error: Value of: resources.size()
  Actual: 5
Expected: 6
.\tests\WebPageSerializerTest.cpp(186): error: Value of: webVectorContains(resources, "http://www.test.com/music.mid")
  Actual: false
Expected: true
[  FAILED  ] WebPageSerializerTest.MultipleFrames (16 ms)
Comment 26 David Levin 2011-03-21 13:31:37 PDT
Reopened.
Comment 27 Jay Civelli 2011-03-22 10:04:21 PDT
Created attachment 86467 [details]
Patch
Comment 28 Jay Civelli 2011-03-22 10:07:36 PDT
Another attempt at landing this patch.
I am not entirely sure while the WebPageSerializerTest.MultipleFrames test failed, as it passes in both debug and release mode on my box.
There was an unbalanced tag for the resource reported as missing:
<embed src="music.mid"></object>
that I fixed, which should hopefully fix the problem.
Comment 29 WebKit Commit Bot 2011-03-22 12:18:16 PDT
Comment on attachment 86467 [details]
Patch

Clearing flags on attachment: 86467

Committed r81686: <http://trac.webkit.org/changeset/81686>
Comment 30 WebKit Commit Bot 2011-03-22 12:18:24 PDT
All reviewed patches have been landed.  Closing bug.
Comment 31 James Simonsen 2011-03-22 19:51:45 PDT
This broke the main Chrome Win bot again. I don't know why it only affects that one. The Win WebKit Canary and Deps bots run it just fine.
Comment 32 Jay Civelli 2011-03-23 16:21:20 PDT
Created attachment 86718 [details]
One more attempt at landing this.

Not sure why the test fails on one of the bot.
One of the embed src is not reported. This could be caused by a plugin.
Disabling that resource in the test and adding a FIXME: for now.
Comment 33 Jay Civelli 2011-03-23 16:33:56 PDT
Reopening for rerelanding.
Comment 34 WebKit Commit Bot 2011-03-23 19:40:36 PDT
The commit-queue encountered the following flaky tests while processing attachment 86718 [details]:

http/tests/loading/bad-server-subframe.html bug 56985 (author: mjs@apple.com)
The commit-queue is continuing to process your patch.
Comment 35 WebKit Commit Bot 2011-03-23 19:44:15 PDT
Comment on attachment 86718 [details]
One more attempt at landing this.

Clearing flags on attachment: 86718

Committed r81846: <http://trac.webkit.org/changeset/81846>
Comment 36 WebKit Commit Bot 2011-03-23 19:44:21 PDT
All reviewed patches have been landed.  Closing bug.