Bug 22666 - Clean up data structures used when collecting URLs of subresources for webarchives
Summary: Clean up data structures used when collecting URLs of subresources for webarc...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 528+ (Nightly build)
Hardware: Mac OS X 10.5
: P2 Normal
Assignee: David Kilzer (:ddkilzer)
URL:
Keywords:
Depends on:
Blocks: 11850
  Show dependency treegraph
 
Reported: 2008-12-04 16:58 PST by David Kilzer (:ddkilzer)
Modified: 2008-12-07 15:20 PST (History)
2 users (show)

See Also:


Attachments
Patch v1 (46.05 KB, patch)
2008-12-05 11:54 PST, David Kilzer (:ddkilzer)
darin: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description David Kilzer (:ddkilzer) 2008-12-04 16:58:07 PST
*SUMMARY
When creating a webarchive from WebCore::LegacyWebArchive::create(), HashSet<String>, Vector<KURL> and Vector<String> were all used to store a list of URLs for resources found in the document.  Instead use a single ListHashSet<KURL> to store the list and resolve the relative URLs as they're added.

A patch for this was attached to Bug 11850 (see Attachment #25753 [details]), but had some issues dealing with empty KURL objects.

See also Bug 11850 Comment #4 for Darin Adler's feedback on the patch.
Comment 1 David Kilzer (:ddkilzer) 2008-12-05 11:54:34 PST
Created attachment 25780 [details]
Patch v1

Proposed patch.

See Bug 11850 Comment #4 for comments on a previous version of this patch.
Comment 2 Darin Adler 2008-12-05 13:18:50 PST
Comment on attachment 25780 [details]
Patch v1

> -    virtual void addSubresourceURLStrings(HashSet<String>&, const String& baseURL) const;
> +    virtual void getSubresourceStyleURLs(ListHashSet<KURL>&, const KURL& baseURL) const;

> -    virtual void addSubresourceURLStrings(HashSet<String>&, const String& baseURL) const { }
> +    virtual void getSubresourceStyleURLs(ListHashSet<KURL>&, const KURL& baseURL) const { }

It seems that if a function is named "get" then it can assume it starts with an empty set and either add to it or replace the set with a whole new one or whatever. But if its job is only to add URLs to an existing set, then it should be named "add". Which kind of functions are these?

> +            ListHashSet<KURL>::iterator kurlIteratorEnd = subresourceURLs.end();
> +            for (ListHashSet<KURL>::iterator kurlIterator = subresourceURLs.begin(); kurlIterator != kurlIteratorEnd; ++kurlIterator) {

I don't think it's good to use "kurl" in this name. Later we'll rename it. Maybe subresourceIterator, or it, or iter?

> +inline void addSubresourceURLForLegacyWebArchive(ListHashSet<KURL>& urls, const KURL& url)

It might be cleaner and inspire less need to include LegacyWebArchive.h if you make this a static member function of the same class that defines the function that has to accumulate the URLs. But I also think it's OK as-is.

I think it's a little strange that this function mentions LegacyWebArchive in its name, but functions like getSubresourceStyleURLs don't. They work as a team.

> +    addSubresourceURLForLegacyWebArchive(urls, document()->completeURL(href()));

You also might want a helper that does the completeURL work. Are you sure you got this right, calling completeURL in all the cases where it's needed and none of the cases where it's already done?

r=me as-is despite the comments above
Comment 3 David Kilzer (:ddkilzer) 2008-12-05 17:24:55 PST
(In reply to comment #2)
> (From update of attachment 25780 [details] [review])
> > -    virtual void addSubresourceURLStrings(HashSet<String>&, const String& baseURL) const;
> > +    virtual void getSubresourceStyleURLs(ListHashSet<KURL>&, const KURL& baseURL) const;
> > -    virtual void addSubresourceURLStrings(HashSet<String>&, const String& baseURL) const { }
> > +    virtual void getSubresourceStyleURLs(ListHashSet<KURL>&, const KURL& baseURL) const { }
> 
> It seems that if a function is named "get" then it can assume it starts with an
> empty set and either add to it or replace the set with a whole new one or
> whatever. But if its job is only to add URLs to an existing set, then it should
> be named "add". Which kind of functions are these?

They're all methods that add to the list.  I'll change them back from get...() to add...().

Technically, they could be all add() methods since each one adds URLs to the list, but I suppose we want to make Node::getSubresourceURLs() stand out since we want people to call it instead.

> > +            ListHashSet<KURL>::iterator kurlIteratorEnd = subresourceURLs.end();
> > +            for (ListHashSet<KURL>::iterator kurlIterator = subresourceURLs.begin(); kurlIterator != kurlIteratorEnd; ++kurlIterator) {
> 
> I don't think it's good to use "kurl" in this name. Later we'll rename it.
> Maybe subresourceIterator, or it, or iter?

There was already an "it" iterator in scope.  I considered "it2", but preferred something more descriptive.  Maybe "listIt" or "listIter" or "listIterator".  Would "jt" be too subtle?  :)

I think I'll just go with "it2".

> > +inline void addSubresourceURLForLegacyWebArchive(ListHashSet<KURL>& urls, const KURL& url)
> 
> It might be cleaner and inspire less need to include LegacyWebArchive.h if you
> make this a static member function of the same class that defines the function
> that has to accumulate the URLs. But I also think it's OK as-is.

Put it into WebCore::Node?  I'll look at doing that.

> I think it's a little strange that this function mentions LegacyWebArchive in
> its name, but functions like getSubresourceStyleURLs don't. They work as a
> team.
> 
> > +    addSubresourceURLForLegacyWebArchive(urls, document()->completeURL(href()));

I'll take out the "ForLegacyWebArchive" part.

> You also might want a helper that does the completeURL work. Are you sure you
> got this right, calling completeURL in all the cases where it's needed and none
> of the cases where it's already done?

I'll look through all the new cases by hand.  There were only a dozen or two of them.

Note that the old code did this for every single URL returned from getSubresourceAttributeStrings(), so it was doing much more work than it needed to.  (I was basing the need to use document->completeURL() on whether the method returned a String versus a KURL.  Will do more research as noted above before landing this.)

Comment 4 David Kilzer (:ddkilzer) 2008-12-05 17:38:40 PST
(In reply to comment #3)
> I think I'll just go with "it2".

No, "iter" and "iterEnd".

Comment 5 David Kilzer (:ddkilzer) 2008-12-06 02:58:20 PST
(In reply to comment #3)
> (In reply to comment #2)
> > (From update of attachment 25780 [details] [review] [review])
> > > +inline void addSubresourceURLForLegacyWebArchive(ListHashSet<KURL>& urls, const KURL& url)
> > 
> > It might be cleaner and inspire less need to include LegacyWebArchive.h if you
> > make this a static member function of the same class that defines the function
> > that has to accumulate the URLs. But I also think it's OK as-is.
> 
> Put it into WebCore::Node?  I'll look at doing that.

Note that the style subsystem will also be adding KURL objects to a ListHashSet<KURL> soon, so I can't make it a static member of WebCore::Node.  (Moving the method to Node.h does make it available without #including LegacyWebArchive.h everywhere, though.)

> > You also might want a helper that does the completeURL work. Are you sure you
> > got this right, calling completeURL in all the cases where it's needed and none
> > of the cases where it's already done?
> 
> I'll look through all the new cases by hand.  There were only a dozen or two of
> them.

After reviewing the changes, I don't think we can guarantee anything when a method (like href() in StyleSheet or the SVG elements) returns a String instead of a KURL.  We should convert such methods to return a KURL object if we want that guarantee.

At worst, Document::completeURL() might be called on a fully-qualified URL and thus have no effect.  I believe the existing code in ToT WebKit does a LOT more double-processing of fully-qualified URLs than the new code will.

Comment 6 David Kilzer (:ddkilzer) 2008-12-06 03:55:17 PST
$ git svn dcommit
Committing to http://svn.webkit.org/repository/webkit/trunk ...
	M	WebCore/ChangeLog
	M	WebCore/WebCore.base.exp
	M	WebCore/WebCore.xcodeproj/project.pbxproj
	M	WebCore/css/CSSStyleSheet.cpp
	M	WebCore/css/CSSStyleSheet.h
	M	WebCore/css/StyleSheet.h
	M	WebCore/dom/Node.cpp
	M	WebCore/dom/Node.h
	M	WebCore/dom/ProcessingInstruction.cpp
	M	WebCore/dom/ProcessingInstruction.h
	M	WebCore/html/HTMLBodyElement.cpp
	M	WebCore/html/HTMLBodyElement.h
	M	WebCore/html/HTMLEmbedElement.cpp
	M	WebCore/html/HTMLEmbedElement.h
	M	WebCore/html/HTMLImageElement.cpp
	M	WebCore/html/HTMLImageElement.h
	M	WebCore/html/HTMLInputElement.cpp
	M	WebCore/html/HTMLInputElement.h
	M	WebCore/html/HTMLLinkElement.cpp
	M	WebCore/html/HTMLLinkElement.h
	M	WebCore/html/HTMLObjectElement.cpp
	M	WebCore/html/HTMLObjectElement.h
	M	WebCore/html/HTMLParamElement.cpp
	M	WebCore/html/HTMLParamElement.h
	M	WebCore/html/HTMLScriptElement.cpp
	M	WebCore/html/HTMLScriptElement.h
	M	WebCore/html/HTMLStyleElement.cpp
	M	WebCore/html/HTMLStyleElement.h
	M	WebCore/html/HTMLTableCellElement.cpp
	M	WebCore/html/HTMLTableCellElement.h
	M	WebCore/html/HTMLTableElement.cpp
	M	WebCore/html/HTMLTableElement.h
	M	WebCore/loader/archive/cf/LegacyWebArchive.cpp
	M	WebCore/svg/SVGCursorElement.cpp
	M	WebCore/svg/SVGCursorElement.h
	M	WebCore/svg/SVGFEImageElement.cpp
	M	WebCore/svg/SVGFEImageElement.h
	M	WebCore/svg/SVGImageElement.cpp
	M	WebCore/svg/SVGImageElement.h
	M	WebCore/svg/SVGScriptElement.cpp
	M	WebCore/svg/SVGScriptElement.h
	M	WebKit/mac/ChangeLog
	M	WebKit/mac/DOM/WebDOMOperations.mm
Committed r39065

http://trac.webkit.org/changeset/39065

Comment 7 Bernhard Rosenkraenzer 2008-12-06 05:53:15 PST
This commit introduces bug 22711
Comment 8 David Kilzer (:ddkilzer) 2008-12-06 09:17:27 PST
(In reply to comment #7)
> This commit introduces bug 22711

Build fix:

$ git svn dcommit
Committing to http://svn.webkit.org/repository/webkit/trunk ...
	M	WebCore/ChangeLog
	M	WebCore/svg/SVGFEImageElement.cpp
Committed r39067

Comment 9 Darin Adler 2008-12-07 15:20:54 PST
(In reply to comment #5)
> After reviewing the changes, I don't think we can guarantee anything when a
> method (like href() in StyleSheet or the SVG elements) returns a String instead
> of a KURL.  We should convert such methods to return a KURL object if we want
> that guarantee.

Your choice seems OK, but I'm not sure your logic above is exactly right.

Many of those functions are part of the public DOM interface to those objects, so they have to return a String unless we teach the bindings generation machinery to turn a KURL back into a String. The functions might be guaranteed to return a String that's already a complete URL, though, and I could imagine us writing code that's aware of that even if the type system isn't how we communicate the guarantee.