Bug 49548

Summary: WebCore cache stores duplicate copies of subresources with URL fragments
Product: WebKit Reporter: David Kilzer (:ddkilzer) <ddkilzer>
Component: WebCore Misc.Assignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: ap, buildbot, eric, koivisto
Priority: P2 Keywords: HasReduction, InRadar
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Attachments:
Description Flags
Test case (zip archive)
none
patch ap: review+

Description David Kilzer (:ddkilzer) 2010-11-15 10:31:12 PST
* SUMMARY
The WebCore cache stores duplicate copies of subresources when they contain different URL fragments--even though the resource is fetched without the URL fragment from the network only once.  (In the case of Safari, CFNetwork is stripping off the URL fragment before fetching it, and is caching the result.)

This nominally only affects SVG fonts since they're the only resource that uses a URL fragment, but adding URL fragments to other subresources also demonstrates this behavior.

* STEPS TO REPRODUCE
1. Download, unzip and open the attached test case (svgfont.html) in Safari.

* EXPECTED RESULTS
One copy of bg_pattern.jpg and ABCFont.svg should be stored in the WebCore cache.

* ACTUAL RESULTS
Multiple copies of bg_pattern.jpg and ABCFont.svg are stored in the WebCore cache.

* REGRESSION
Unknown.  I believe the WebCore cache may have had this issue for a long time.
Comment 1 David Kilzer (:ddkilzer) 2010-11-15 10:32:22 PST
Created attachment 73909 [details]
Test case (zip archive)
Comment 2 David Kilzer (:ddkilzer) 2010-11-15 10:33:50 PST
<rdar://problem/8668586>
Comment 3 Antti Koivisto 2010-12-14 13:53:49 PST
Created attachment 76566 [details]
patch

- Strip fragment identifiers from HTTP and file URLs for the memory cache.
- Changed some CachedResourceLoader and MemoryCache interfaces to use KURLs
  instead of strings to reduce repeated URL parsing.
Comment 4 Alexey Proskuryakov 2010-12-14 14:24:03 PST
Comment on attachment 76566 [details]
patch

View in context: https://bugs.webkit.org/attachment.cgi?id=76566&action=review

r=me. Please double check for invalid URLs being passed around - I don't think that we have a good general policy of who should check for those, so they can creep up.

Can any assertions for !hasFragmentIdentifier() be added in code that's behind MemoryCache interface? I think that these could be quite valuable.

> WebCore/loader/cache/CachedResource.cpp:119
> -    ASSERT(url().isNull() || cache()->resourceForURL(url()) != this);
> +    ASSERT(url().isNull() || cache()->resourceForURL(KURL(ParsedURLString, url())) != this);

Is url() always a valid parsed URL string here? Otherwise, KURL(ParsedURLString, url()) can assert.

> WebCore/loader/cache/CachedResourceLoader.cpp:160
> +    return cache()->requestUserCSSStyleSheet(this, KURL(ParsedURLString, url), charset);

Is url guaranteed to be a parsed URL string? If it is, perhaps the caller could pass a KURL object?

This code looks dangerous. Note that we use completeURL() in other CachedResourceLoader methods.

> WebCore/loader/cache/MemoryCache.cpp:225
> +    if (!originalURL.hasFragmentIdentifier())
> +        return originalURL;

I would add a comment explaining that invalid URLs are returned unchanged. Perhaps even to a declaration in header, not sure.

> WebCore/loader/cache/MemoryCache.cpp:227
> +    // Data urls must be unmodified and it is also safer to keep them for custom protocols.

Isn't this a bug in data URL parsing? Sounds like these shouldn't have fragments.
Comment 5 Eric Seidel (no email) 2010-12-14 15:14:01 PST
Attachment 76566 [details] was posted by a committer and has review+, assigning to Antti Koivisto for commit.
Comment 6 Antti Koivisto 2010-12-14 15:42:36 PST
(In reply to comment #4)
> (From update of attachment 76566 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=76566&action=review
> 
> r=me. Please double check for invalid URLs being passed around - I don't think that we have a good general policy of who should check for those, so they can creep up.

Ok.

> Can any assertions for !hasFragmentIdentifier() be added in code that's behind MemoryCache interface? I think that these could be quite valuable.

I wouldn't be surprised if there were WebKit clients using custom protocols that expect to see everything in the request URL. To be safe, this patch still allow frag ids in the cache so simple !hasFragmentIdentifier() would not work. I'll see if there is some other asserts that could be useful.

> > WebCore/loader/cache/CachedResource.cpp:119
> > -    ASSERT(url().isNull() || cache()->resourceForURL(url()) != this);
> > +    ASSERT(url().isNull() || cache()->resourceForURL(KURL(ParsedURLString, url())) != this);
> 
> Is url() always a valid parsed URL string here? Otherwise, KURL(ParsedURLString, url()) can assert.

If not that is likely to be a bug that we want to catch. I was thinking of switching CachedResource to KURL too later.

> > WebCore/loader/cache/CachedResourceLoader.cpp:160
> > +    return cache()->requestUserCSSStyleSheet(this, KURL(ParsedURLString, url), charset);
> 
> Is url guaranteed to be a parsed URL string? If it is, perhaps the caller could pass a KURL object?
>
> This code looks dangerous. Note that we use completeURL() in other CachedResourceLoader methods.

It seems it will only ever gets invoked if you have a user stylesheets with @import rules. I like that code ends up guaranteeing that the url has been parsed and resolved absolute, though it is rather messy.

I'll see about passing KURL.

> > WebCore/loader/cache/MemoryCache.cpp:227
> > +    // Data urls must be unmodified and it is also safer to keep them for custom protocols.
> 
> Isn't this a bug in data URL parsing? Sounds like these shouldn't have fragments.

Yeah, I think so. Data and javascript URLs (and maybe others) should never have fragment identifiers. That seemed like something better fixed separately as it might expand quite a bit.
Comment 7 Build Bot 2010-12-14 20:59:05 PST
Attachment 76566 [details] did not build on win:
Build output: http://queues.webkit.org/results/7118009
Comment 8 Antti Koivisto 2010-12-15 03:28:24 PST
http://trac.webkit.org/changeset/74107