The WebPageSerializer::retrieveAllResources() does not retrieve CSS resources from the page, such as: @import background urls @font-face
Created attachment 86175 [details] Initial patch
Created attachment 86223 [details] Fixed CSS test
Attachment 86223 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebKit/chromium/ChangeLog', u'Sourc..." exit_code: 1 Source/WebKit/chromium/tests/data/pageserialization/css_test_page.html:15: Line contains tab character. [whitespace/tab] [5] Source/WebKit/chromium/tests/data/pageserialization/css_test_page.html:16: Line contains tab character. [whitespace/tab] [5] Total errors found: 2 in 8 files If any of these errors are false positives, please file a bug against check-webkit-style.
Comment on attachment 86223 [details] Fixed CSS test View in context: https://bugs.webkit.org/attachment.cgi?id=86223&action=review I realize you removed the review request, but here are some random nits. > Source/WebKit/chromium/src/WebPageSerializer.cpp:132 > + // Process the attribute if one was found found. "found found" => "found" > Source/WebKit/chromium/src/WebPageSerializer.cpp:137 > + KURL url(ParsedURLString, element->document()->completeURL(value)); No need to use ParsedURLString. You can just write: KURL url = element->document()->completeURL(value); > Source/WebKit/chromium/src/WebPageSerializer.cpp:143 > + if (!url.isEmpty() && !url.isNull() && url.isValid() > + && (url.protocolIs("http") || url.protocolIs("https") || url.protocolIs("file"))) { > + resourceURLs->add(url); > + } You should do the JavaScript URL check together with these. Also "!url.isEmpty() && !url.isNull() && url.isValid()" are all redundant with the later checks, right? > Source/WebKit/chromium/src/WebPageSerializer.cpp:149 > + CSSStyleDeclaration* styleDeclaration = element->style(); > + if (styleDeclaration) We usually combine these lines: if (CSSStyleDeclaration* styleDeclaration = element->style()) ... > Source/WebKit/chromium/src/WebPageSerializer.cpp:161 > + // If we have already seen that frame, ignore it. This comment is a "what" not a "why" and should be removed. > Source/WebKit/chromium/src/WebPageSerializer.cpp:189 > + HTMLElement* element = toHTMLElement(allNodes->item(i)); > + if (element) { These should be combined. > Source/WebKit/chromium/src/WebPageSerializer.cpp:209 > + // Retrieving the background property would probably be good enough. > + // Iterating for any other image properties to be safe (I am not sure there > + // are actually others). Maybe list bullets can have images? > Source/WebKit/chromium/src/WebPageSerializer.cpp:213 > + // prop ID, but CSSStyleDeclaration only gives access to property names, "prop" <-- please don't use abbreviations. > Source/WebKit/chromium/src/WebPageSerializer.cpp:216 > + RefPtr<CSSValue> value = > + styleDeclaration->getPropertyCSSValue(styleDeclaration->item(i)); There's not 80 col limit. Please don't add unneeded line breaks. > Source/WebKit/chromium/src/WebPageSerializer.cpp:225 > + KURL(ParsedURLString, > + static_cast<CSSStyleSheet*>(styleDeclaration->stylesheet())->document()->completeURL(url))); No need to use ParsedURLString. Doesn't completeURL return a KURL? > Source/WebKit/chromium/src/WebPageSerializer.cpp:240 > + if (!visitedStyleSheets->add(styleSheet).second) > + return; // We have already seen that styleSheet. That's a very strange way to test if we've got the stylesheet. > Source/WebKit/chromium/src/WebPageSerializer.cpp:253 > + resourceURLs->add( > + KURL(ParsedURLString, > + styleSheet->document()->completeURL(importRule->href()))); No ParsedURLString, all on one line. > Source/WebKit/chromium/src/WebPageSerializer.cpp:257 > + retrieveResourcesForCSSStyleSheet( > + static_cast<CSSImportRule*>(item)->styleSheet(), > + visitedStyleSheets, supportedSchemes, resourceURLs); All on one line. > Source/WebKit/chromium/src/WebPageSerializer.cpp:265 > + CSSFontFaceSrcValue* fontFaceSrc = > + static_cast<CSSFontFaceSrcValue*>(valueList->item(j)); one line. :) > Source/WebKit/chromium/src/WebPageSerializer.cpp:315 > + // Now retrieve CSS resources from style-sheets. Again, a what, not a why.
Created attachment 86238 [details] Fixed tests.
(In reply to comment #4) Thanks for the comments! > (From update of attachment 86223 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=86223&action=review > > I realize you removed the review request, but here are some random nits. > > > Source/WebKit/chromium/src/WebPageSerializer.cpp:132 > > + // Process the attribute if one was found found. > > "found found" => "found" Done. > > Source/WebKit/chromium/src/WebPageSerializer.cpp:137 > > + KURL url(ParsedURLString, element->document()->completeURL(value)); > > No need to use ParsedURLString. You can just write: > > KURL url = element->document()->completeURL(value); Done everywhere. > > Source/WebKit/chromium/src/WebPageSerializer.cpp:143 > > + if (!url.isEmpty() && !url.isNull() && url.isValid() > > + && (url.protocolIs("http") || url.protocolIs("https") || url.protocolIs("file"))) { > > + resourceURLs->add(url); > > + } > > You should do the JavaScript URL check together with these. Also "!url.isEmpty() && !url.isNull() && url.isValid()" are all redundant with the later checks, right? Looking at KURL.cpp, it seems we still need isValid(), but you are right we don't need the rest or even to check for the JS protocol. > > Source/WebKit/chromium/src/WebPageSerializer.cpp:149 > > + CSSStyleDeclaration* styleDeclaration = element->style(); > > + if (styleDeclaration) > > We usually combine these lines: > > if (CSSStyleDeclaration* styleDeclaration = element->style()) > ... OK > > > Source/WebKit/chromium/src/WebPageSerializer.cpp:161 > > + // If we have already seen that frame, ignore it. > > This comment is a "what" not a "why" and should be removed. I thought the comment useful as it is not really obvious that ListHashSet::add() returns a std::pair with second being a bool that indicates whether the element was inserted. > > Source/WebKit/chromium/src/WebPageSerializer.cpp:189 > > + HTMLElement* element = toHTMLElement(allNodes->item(i)); > > + if (element) { > > These should be combined. Done. > > Source/WebKit/chromium/src/WebPageSerializer.cpp:209 > > + // Retrieving the background property would probably be good enough. > > + // Iterating for any other image properties to be safe (I am not sure there > > + // are actually others). > > Maybe list bullets can have images? You are right! Updated the command and the test. > > Source/WebKit/chromium/src/WebPageSerializer.cpp:213 > > + // prop ID, but CSSStyleDeclaration only gives access to property names, > > "prop" <-- please don't use abbreviations. OK > > Source/WebKit/chromium/src/WebPageSerializer.cpp:216 > > + RefPtr<CSSValue> value = > > + styleDeclaration->getPropertyCSSValue(styleDeclaration->item(i)); > > There's not 80 col limit. Please don't add unneeded line breaks. Done. > > Source/WebKit/chromium/src/WebPageSerializer.cpp:225 > > + KURL(ParsedURLString, > > + static_cast<CSSStyleSheet*>(styleDeclaration->stylesheet())->document()->completeURL(url))); > > No need to use ParsedURLString. Doesn't completeURL return a KURL? Done. > > Source/WebKit/chromium/src/WebPageSerializer.cpp:240 > > + if (!visitedStyleSheets->add(styleSheet).second) > > + return; // We have already seen that styleSheet. > > That's a very strange way to test if we've got the stylesheet. Yeah, LinkedHashSet:add() inserts and returns a pair, second is a boolean that indicates if the value has been inserted. > > Source/WebKit/chromium/src/WebPageSerializer.cpp:253 > > + resourceURLs->add( > > + KURL(ParsedURLString, > > + styleSheet->document()->completeURL(importRule->href()))); > > No ParsedURLString, all on one line. Done. > > Source/WebKit/chromium/src/WebPageSerializer.cpp:257 > > + retrieveResourcesForCSSStyleSheet( > > + static_cast<CSSImportRule*>(item)->styleSheet(), > > + visitedStyleSheets, supportedSchemes, resourceURLs); > > All on one line. OK. > > Source/WebKit/chromium/src/WebPageSerializer.cpp:265 > > + CSSFontFaceSrcValue* fontFaceSrc = > > + static_cast<CSSFontFaceSrcValue*>(valueList->item(j)); > > one line. :) Done. > > Source/WebKit/chromium/src/WebPageSerializer.cpp:315 > > + // Now retrieve CSS resources from style-sheets. > > Again, a what, not a why. Changed.
Created attachment 86250 [details] Addressed Adam's comments
Comment on attachment 86223 [details] Fixed CSS test View in context: https://bugs.webkit.org/attachment.cgi?id=86223&action=review >>> Source/WebKit/chromium/src/WebPageSerializer.cpp:143 >>> + if (!url.isEmpty() && !url.isNull() && url.isValid() >>> + && (url.protocolIs("http") || url.protocolIs("https") || url.protocolIs("file"))) { >>> + resourceURLs->add(url); >>> + } >> >> You should do the JavaScript URL check together with these. Also "!url.isEmpty() && !url.isNull() && url.isValid()" are all redundant with the later checks, right? > > Looking at KURL.cpp, it seems we still need isValid(), but you are right we don't need the rest or even to check for the JS protocol. Ok. >>> Source/WebKit/chromium/src/WebPageSerializer.cpp:161 >>> + // If we have already seen that frame, ignore it. >> >> This comment is a "what" not a "why" and should be removed. > > I thought the comment useful as it is not really obvious that ListHashSet::add() returns a std::pair with second being a bool that indicates whether the element was inserted. That's a pretty nutty API... It's probably not worth fixing the API in this patch. Feel free to leave the comment if think it's helpful.
Comment on attachment 86250 [details] Addressed Adam's comments View in context: https://bugs.webkit.org/attachment.cgi?id=86250&action=review Does this do an url rewriting? If a page comes from http://www.example.com/ and refers to "http://www.example.com/other.jpg", how does "other.jpg" get loaded from the serialized version? Many places where I ask questions are things that are not obvious to me from looking at the code and may be are indicative of comments that should be added to the code. (Also since I don't know this code well of course some questions have obvious answers to you but not to me :).) > Source/WebKit/chromium/src/WebPageSerializer.cpp:90 > + || element->hasTagName(HTMLNames::objectTag) || element->hasTagName(HTMLNames::embedTag)) The indentation on these two continuation lines seems odd. Typically it should align just inside the parenthesis that is continued and that happens on neither line. > Source/WebKit/chromium/src/WebPageSerializer.cpp:92 > + Frame* frame = static_cast<const HTMLFrameOwnerElement*>(element)->contentFrame(); Why do we continue through the code if the element has that tag name but !element->isFrameOwnerElement? Is this code important to that case: if (CSSStyleDeclaration* styleDeclaration = element->style()) retrieveResourcesForCSSStyleDeclaration(styleDeclaration, resourceURLs); and if so, why isn't it important when element->isFrameOwnerElement ? > Source/WebKit/chromium/src/WebPageSerializer.cpp:101 > if (element->hasTagName(HTMLNames::imgTag) || element->hasTagName(HTMLNames::scriptTag)) I would like this better as a separate function. Then the code would change as well ideally. It would look something like this (and have a name that tells me what it is doing). if (element->hasTagName(HTMLNames::imgTag) || element->hasTagName(HTMLNames::scriptTag)) return &HTMLNames::srcAttr; if (element->hasTagName(HTMLNames::inputTag)) { if (input->isImageButton()) return &HTMLNames::srcAttr; return 0; } > Source/WebKit/chromium/src/WebPageSerializer.cpp:140 > + // Ignore javascript content. How is this ignoring javascript content? > Source/WebKit/chromium/src/WebPageSerializer.cpp:144 > + // does no have a cache mechanism, we skip it as well. s/no/not/ > Source/WebKit/chromium/src/WebPageSerializer.cpp:219 > + // Non cached-image are just place-holders and do not contain data. images (since you used "are", etc.) > Source/WebKit/chromium/src/WebPageSerializer.cpp:222 > + resourceURLs->add(static_cast<CSSStyleSheet*>(styleDeclaration->stylesheet())->document()->completeURL(url)); How do you know that it is a CSSStyleSheet? > Source/WebKit/chromium/src/WebPageSerializer.cpp:250 > + retrieveResourcesForCSSStyleSheet(static_cast<CSSImportRule*>(item)->styleSheet(), visitedStyleSheets, supportedSchemes, resourceURLs); Use importRule and avoid another cast. > Source/WebKit/chromium/src/WebPageSerializer.cpp:257 > + CSSFontFaceSrcValue* fontFaceSrc = static_cast<CSSFontFaceSrcValue*>(valueList->item(j)); How do you know the items are CSSFontFaceSrcValue? > Source/WebKit/chromium/src/WebPageSerializer.cpp:310 > + iter != styleSheets.end(); ++iter) { Alignment (as I mentioned below). > Source/WebKit/chromium/src/WebPageSerializer.cpp:319 > + iter != resourceKURLs.end(); ++iter, i++) { Might as well do ++i for consistency with other places. > Source/WebKit/chromium/src/WebPageSerializer.cpp:330 > + iter != frameKURLs.end(); ++iter, ++i) { Align with interior of ( or just don't wrap the line. Up to you. > Source/WebKit/chromium/tests/WebPageSerializerTest.cpp:53 > + TestWebFrameClient() : m_scriptEnabled(false) {} space in the {} > Source/WebKit/chromium/tests/WebPageSerializerTest.cpp:54 > + virtual bool allowScript(WebFrame*, bool enabledPerSettings) { return m_scriptEnabled; } I would comment out "enabledPerSettings" like /* enabledPerSettings*/ since it is unused that may cause errors in some builds. > Source/WebKit/chromium/tests/WebPageSerializerTest.cpp:181 > + Pls don't do unnecessary space changes. > Source/WebKit/chromium/tests/WebPageSerializerTest.cpp:226 > + m_webView, m_supportedSchemes, &resources, &frames)); You don't need to wrap the line but if it makes you happy...
(In reply to comment #9) > (From update of attachment 86250 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=86250&action=review > > Does this do an url rewriting? No it does not. It simply returns the URLs for the resources found in the page. > If a page comes from http://www.example.com/ and refers to "http://www.example.com/other.jpg", how does "other.jpg" get loaded from the serialized version? This CL does not deal with that. The "save-as" page feature stores the resources from the page into a directory and rewrite the URLs in an extra step. > Many places where I ask questions are things that are not obvious to me from looking at the code and may be are indicative of comments that should be added to the code. (Also since I don't know this code well of course some questions have obvious answers to you but not to me :).) > > > Source/WebKit/chromium/src/WebPageSerializer.cpp:90 > > + || element->hasTagName(HTMLNames::objectTag) || element->hasTagName(HTMLNames::embedTag)) > > The indentation on these two continuation lines seems odd. Typically it should align just inside the parenthesis that is continued and that happens on neither line. OK, I tried to do that right. > > Source/WebKit/chromium/src/WebPageSerializer.cpp:92 > > + Frame* frame = static_cast<const HTMLFrameOwnerElement*>(element)->contentFrame(); > > Why do we continue through the code if the element has that tag name but !element->isFrameOwnerElement? object and embed tags can be used as iframe if they source is HTML, in which case isFrameOwnerElement returns true. Otherwise they should simply be considered as a resource. Note that an iframe can also be used to display an image I believe, and that case does not seem to be supported. I filed a bug https://bugs.webkit.org/show_bug.cgi?id=57300 and will address that separately. > Is this code important to that case: > if (CSSStyleDeclaration* styleDeclaration = element->style()) > retrieveResourcesForCSSStyleDeclaration(styleDeclaration, resourceURLs); > > and if so, why isn't it important when element->isFrameOwnerElement ? I don't think style applies to the iframe tag itself, so we would not get anything interesting from that. > > Source/WebKit/chromium/src/WebPageSerializer.cpp:101 > > if (element->hasTagName(HTMLNames::imgTag) || element->hasTagName(HTMLNames::scriptTag)) > > I would like this better as a separate function. > > Then the code would change as well ideally. > > It would look something like this (and have a name that tells me what it is doing). > > if (element->hasTagName(HTMLNames::imgTag) || element->hasTagName(HTMLNames::scriptTag)) > return &HTMLNames::srcAttr; > > if (element->hasTagName(HTMLNames::inputTag)) { > if (input->isImageButton()) > return &HTMLNames::srcAttr; > return 0; > } Good idea, done. > > Source/WebKit/chromium/src/WebPageSerializer.cpp:140 > > + // Ignore javascript content. > > How is this ignoring javascript content? Out-of-date comment, removed. > > Source/WebKit/chromium/src/WebPageSerializer.cpp:144 > > + // does no have a cache mechanism, we skip it as well. > > s/no/not/ Done. > > Source/WebKit/chromium/src/WebPageSerializer.cpp:219 > > + // Non cached-image are just place-holders and do not contain data. > > images (since you used "are", etc.) Done. > > Source/WebKit/chromium/src/WebPageSerializer.cpp:222 > > + resourceURLs->add(static_cast<CSSStyleSheet*>(styleDeclaration->stylesheet())->document()->completeURL(url)); > > How do you know that it is a CSSStyleSheet? Intuition... What? Not good enough? OK, now explicitly testing for CSS type before casting. > > Source/WebKit/chromium/src/WebPageSerializer.cpp:250 > > + retrieveResourcesForCSSStyleSheet(static_cast<CSSImportRule*>(item)->styleSheet(), visitedStyleSheets, supportedSchemes, resourceURLs); > > Use importRule and avoid another cast. Done. > > Source/WebKit/chromium/src/WebPageSerializer.cpp:257 > > + CSSFontFaceSrcValue* fontFaceSrc = static_cast<CSSFontFaceSrcValue*>(valueList->item(j)); > > How do you know the items are CSSFontFaceSrcValue? Sadly there does not seem to be a reliable way to know. I am doing like WebCore/css/CSSFontSelector.cpp does, which is assume the CSSPropertySrc value only contains CSSFontFaceSrcValue. Added a comment to reflect this. > > Source/WebKit/chromium/src/WebPageSerializer.cpp:310 > > + iter != styleSheets.end(); ++iter) { > > Alignment (as I mentioned below). Done. > > Source/WebKit/chromium/src/WebPageSerializer.cpp:319 > > + iter != resourceKURLs.end(); ++iter, i++) { > > Might as well do ++i for consistency with other places. Done. > > Source/WebKit/chromium/src/WebPageSerializer.cpp:330 > > + iter != frameKURLs.end(); ++iter, ++i) { > > Align with interior of ( > or just don't wrap the line. Up to you. Done. > > Source/WebKit/chromium/tests/WebPageSerializerTest.cpp:53 > > + TestWebFrameClient() : m_scriptEnabled(false) {} > > space in the {} Done. > > Source/WebKit/chromium/tests/WebPageSerializerTest.cpp:54 > > + virtual bool allowScript(WebFrame*, bool enabledPerSettings) { return m_scriptEnabled; } > > I would comment out "enabledPerSettings" like /* enabledPerSettings*/ since it is unused that may cause errors in some builds. Done. > > Source/WebKit/chromium/tests/WebPageSerializerTest.cpp:181 > > + > > Pls don't do unnecessary space changes. Removed. > > Source/WebKit/chromium/tests/WebPageSerializerTest.cpp:226 > > + m_webView, m_supportedSchemes, &resources, &frames)); > > You don't need to wrap the line but if it makes you happy... Unwrapped.
Created attachment 87275 [details] Addressed David's comments
Attachment 87275 [details] did not build on chromium: Build output: http://queues.webkit.org/results/8280139
Created attachment 87277 [details] Fixed build
Comment on attachment 87277 [details] Fixed build Clearing flags on attachment: 87277 Committed r82293: <http://trac.webkit.org/changeset/82293>
All reviewed patches have been landed. Closing bug.
Looks like this might have broken RetrieveCSSResources: http://build.chromium.org/p/chromium.webkit/builders/Webkit%20Win/builds/3137/steps/webkit_unit_tests/logs/RetrieveCSSResources WebPageSerializerTest.RetrieveCSSResources: Did not complete. I'll see if the issue continues.
Happened twice. Rolling out.