RESOLVED FIXED 22666
Clean up data structures used when collecting URLs of subresources for webarchives
https://bugs.webkit.org/show_bug.cgi?id=22666
Summary Clean up data structures used when collecting URLs of subresources for webarc...
David Kilzer (:ddkilzer)
Reported 2008-12-04 16:58:07 PST
*SUMMARY When creating a webarchive from WebCore::LegacyWebArchive::create(), HashSet<String>, Vector<KURL> and Vector<String> were all used to store a list of URLs for resources found in the document. Instead use a single ListHashSet<KURL> to store the list and resolve the relative URLs as they're added. A patch for this was attached to Bug 11850 (see Attachment #25753 [details]), but had some issues dealing with empty KURL objects. See also Bug 11850 Comment #4 for Darin Adler's feedback on the patch.
Attachments
Patch v1 (46.05 KB, patch)
2008-12-05 11:54 PST, David Kilzer (:ddkilzer)
darin: review+
David Kilzer (:ddkilzer)
Comment 1 2008-12-05 11:54:34 PST
Created attachment 25780 [details] Patch v1 Proposed patch. See Bug 11850 Comment #4 for comments on a previous version of this patch.
Darin Adler
Comment 2 2008-12-05 13:18:50 PST
Comment on attachment 25780 [details] Patch v1 > - virtual void addSubresourceURLStrings(HashSet<String>&, const String& baseURL) const; > + virtual void getSubresourceStyleURLs(ListHashSet<KURL>&, const KURL& baseURL) const; > - virtual void addSubresourceURLStrings(HashSet<String>&, const String& baseURL) const { } > + virtual void getSubresourceStyleURLs(ListHashSet<KURL>&, const KURL& baseURL) const { } It seems that if a function is named "get" then it can assume it starts with an empty set and either add to it or replace the set with a whole new one or whatever. But if its job is only to add URLs to an existing set, then it should be named "add". Which kind of functions are these? > + ListHashSet<KURL>::iterator kurlIteratorEnd = subresourceURLs.end(); > + for (ListHashSet<KURL>::iterator kurlIterator = subresourceURLs.begin(); kurlIterator != kurlIteratorEnd; ++kurlIterator) { I don't think it's good to use "kurl" in this name. Later we'll rename it. Maybe subresourceIterator, or it, or iter? > +inline void addSubresourceURLForLegacyWebArchive(ListHashSet<KURL>& urls, const KURL& url) It might be cleaner and inspire less need to include LegacyWebArchive.h if you make this a static member function of the same class that defines the function that has to accumulate the URLs. But I also think it's OK as-is. I think it's a little strange that this function mentions LegacyWebArchive in its name, but functions like getSubresourceStyleURLs don't. They work as a team. > + addSubresourceURLForLegacyWebArchive(urls, document()->completeURL(href())); You also might want a helper that does the completeURL work. Are you sure you got this right, calling completeURL in all the cases where it's needed and none of the cases where it's already done? r=me as-is despite the comments above
David Kilzer (:ddkilzer)
Comment 3 2008-12-05 17:24:55 PST
(In reply to comment #2) > (From update of attachment 25780 [details] [review]) > > - virtual void addSubresourceURLStrings(HashSet<String>&, const String& baseURL) const; > > + virtual void getSubresourceStyleURLs(ListHashSet<KURL>&, const KURL& baseURL) const; > > - virtual void addSubresourceURLStrings(HashSet<String>&, const String& baseURL) const { } > > + virtual void getSubresourceStyleURLs(ListHashSet<KURL>&, const KURL& baseURL) const { } > > It seems that if a function is named "get" then it can assume it starts with an > empty set and either add to it or replace the set with a whole new one or > whatever. But if its job is only to add URLs to an existing set, then it should > be named "add". Which kind of functions are these? They're all methods that add to the list. I'll change them back from get...() to add...(). Technically, they could be all add() methods since each one adds URLs to the list, but I suppose we want to make Node::getSubresourceURLs() stand out since we want people to call it instead. > > + ListHashSet<KURL>::iterator kurlIteratorEnd = subresourceURLs.end(); > > + for (ListHashSet<KURL>::iterator kurlIterator = subresourceURLs.begin(); kurlIterator != kurlIteratorEnd; ++kurlIterator) { > > I don't think it's good to use "kurl" in this name. Later we'll rename it. > Maybe subresourceIterator, or it, or iter? There was already an "it" iterator in scope. I considered "it2", but preferred something more descriptive. Maybe "listIt" or "listIter" or "listIterator". Would "jt" be too subtle? :) I think I'll just go with "it2". > > +inline void addSubresourceURLForLegacyWebArchive(ListHashSet<KURL>& urls, const KURL& url) > > It might be cleaner and inspire less need to include LegacyWebArchive.h if you > make this a static member function of the same class that defines the function > that has to accumulate the URLs. But I also think it's OK as-is. Put it into WebCore::Node? I'll look at doing that. > I think it's a little strange that this function mentions LegacyWebArchive in > its name, but functions like getSubresourceStyleURLs don't. They work as a > team. > > > + addSubresourceURLForLegacyWebArchive(urls, document()->completeURL(href())); I'll take out the "ForLegacyWebArchive" part. > You also might want a helper that does the completeURL work. Are you sure you > got this right, calling completeURL in all the cases where it's needed and none > of the cases where it's already done? I'll look through all the new cases by hand. There were only a dozen or two of them. Note that the old code did this for every single URL returned from getSubresourceAttributeStrings(), so it was doing much more work than it needed to. (I was basing the need to use document->completeURL() on whether the method returned a String versus a KURL. Will do more research as noted above before landing this.)
David Kilzer (:ddkilzer)
Comment 4 2008-12-05 17:38:40 PST
(In reply to comment #3) > I think I'll just go with "it2". No, "iter" and "iterEnd".
David Kilzer (:ddkilzer)
Comment 5 2008-12-06 02:58:20 PST
(In reply to comment #3) > (In reply to comment #2) > > (From update of attachment 25780 [details] [review] [review]) > > > +inline void addSubresourceURLForLegacyWebArchive(ListHashSet<KURL>& urls, const KURL& url) > > > > It might be cleaner and inspire less need to include LegacyWebArchive.h if you > > make this a static member function of the same class that defines the function > > that has to accumulate the URLs. But I also think it's OK as-is. > > Put it into WebCore::Node? I'll look at doing that. Note that the style subsystem will also be adding KURL objects to a ListHashSet<KURL> soon, so I can't make it a static member of WebCore::Node. (Moving the method to Node.h does make it available without #including LegacyWebArchive.h everywhere, though.) > > You also might want a helper that does the completeURL work. Are you sure you > > got this right, calling completeURL in all the cases where it's needed and none > > of the cases where it's already done? > > I'll look through all the new cases by hand. There were only a dozen or two of > them. After reviewing the changes, I don't think we can guarantee anything when a method (like href() in StyleSheet or the SVG elements) returns a String instead of a KURL. We should convert such methods to return a KURL object if we want that guarantee. At worst, Document::completeURL() might be called on a fully-qualified URL and thus have no effect. I believe the existing code in ToT WebKit does a LOT more double-processing of fully-qualified URLs than the new code will.
David Kilzer (:ddkilzer)
Comment 6 2008-12-06 03:55:17 PST
$ git svn dcommit Committing to http://svn.webkit.org/repository/webkit/trunk ... M WebCore/ChangeLog M WebCore/WebCore.base.exp M WebCore/WebCore.xcodeproj/project.pbxproj M WebCore/css/CSSStyleSheet.cpp M WebCore/css/CSSStyleSheet.h M WebCore/css/StyleSheet.h M WebCore/dom/Node.cpp M WebCore/dom/Node.h M WebCore/dom/ProcessingInstruction.cpp M WebCore/dom/ProcessingInstruction.h M WebCore/html/HTMLBodyElement.cpp M WebCore/html/HTMLBodyElement.h M WebCore/html/HTMLEmbedElement.cpp M WebCore/html/HTMLEmbedElement.h M WebCore/html/HTMLImageElement.cpp M WebCore/html/HTMLImageElement.h M WebCore/html/HTMLInputElement.cpp M WebCore/html/HTMLInputElement.h M WebCore/html/HTMLLinkElement.cpp M WebCore/html/HTMLLinkElement.h M WebCore/html/HTMLObjectElement.cpp M WebCore/html/HTMLObjectElement.h M WebCore/html/HTMLParamElement.cpp M WebCore/html/HTMLParamElement.h M WebCore/html/HTMLScriptElement.cpp M WebCore/html/HTMLScriptElement.h M WebCore/html/HTMLStyleElement.cpp M WebCore/html/HTMLStyleElement.h M WebCore/html/HTMLTableCellElement.cpp M WebCore/html/HTMLTableCellElement.h M WebCore/html/HTMLTableElement.cpp M WebCore/html/HTMLTableElement.h M WebCore/loader/archive/cf/LegacyWebArchive.cpp M WebCore/svg/SVGCursorElement.cpp M WebCore/svg/SVGCursorElement.h M WebCore/svg/SVGFEImageElement.cpp M WebCore/svg/SVGFEImageElement.h M WebCore/svg/SVGImageElement.cpp M WebCore/svg/SVGImageElement.h M WebCore/svg/SVGScriptElement.cpp M WebCore/svg/SVGScriptElement.h M WebKit/mac/ChangeLog M WebKit/mac/DOM/WebDOMOperations.mm Committed r39065 http://trac.webkit.org/changeset/39065
Bernhard Rosenkraenzer
Comment 7 2008-12-06 05:53:15 PST
This commit introduces bug 22711
David Kilzer (:ddkilzer)
Comment 8 2008-12-06 09:17:27 PST
(In reply to comment #7) > This commit introduces bug 22711 Build fix: $ git svn dcommit Committing to http://svn.webkit.org/repository/webkit/trunk ... M WebCore/ChangeLog M WebCore/svg/SVGFEImageElement.cpp Committed r39067
Darin Adler
Comment 9 2008-12-07 15:20:54 PST
(In reply to comment #5) > After reviewing the changes, I don't think we can guarantee anything when a > method (like href() in StyleSheet or the SVG elements) returns a String instead > of a KURL. We should convert such methods to return a KURL object if we want > that guarantee. Your choice seems OK, but I'm not sure your logic above is exactly right. Many of those functions are part of the public DOM interface to those objects, so they have to return a String unless we teach the bindings generation machinery to turn a KURL back into a String. The functions might be guaranteed to return a String that's already a complete URL, though, and I could imagine us writing code that's aware of that even if the type system isn't how we communicate the guarantee.
Note You need to log in before you can comment on or make changes to this bug.