Bug 22747

Summary: unbounded growth in the cache due to resource overhead
Product: WebKit Reporter: Stephanie Lewis <slewis>
Component: Page LoadingAssignee: Stephanie Lewis <slewis>
Severity: Normal CC: dglazkov
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Mac   
OS: OS X 10.5   
Bug Depends on:    
Bug Blocks: 22969    
Description Flags
Patch to add minimum resource size
patch to set minimum resource size in cache
sam: review-
new patch that calculates some sizes
ggaren: review+
Death by a Million Papercuts Test
Death by a Million Papercuts Test v2
Death by a Million Papercuts Test v3 none

Description Stephanie Lewis 2008-12-08 18:02:51 PST
When running the stress test the WebCore cache shows unbounded growth.   This was discovered to be due to the average size of a resource in the cache becoming < 1KB.  At which point the requests and response headers as well as other overhead (4KB) become prohibitive.

Fix by accounting for the overhead size when calculating how full the cache is.  Next steps are to pare down the size of the overhead and to limit the average size of a resource in the cache.
Comment 1 Stephanie Lewis 2008-12-08 19:01:00 PST
Created attachment 25867 [details]
Patch to add minimum resource size
Comment 2 Stephanie Lewis 2008-12-08 19:20:30 PST
Created attachment 25868 [details]
patch to set minimum resource size in cache

clean patch this time
Comment 3 Sam Weinig 2008-12-08 21:40:06 PST
Comment on attachment 25868 [details]
patch to set minimum resource size in cache

> +unsigned CachedResource::overheadSize() {
The { needs to go on the next line.

> +   
> +    // FIXME: Find some programmatic lighweight way to calculate response size, and size of the different CachedResource classes.  
> +    // This is a rough estimate of resource overhead based on stats collected from the stress test.
> +    return (sizeof(CachedResource) + 192 +  // average size of m_url.
> +           896 * 2 +                        // average size of ResourceResponse.  Doubled to account for the WebCore copy and the WebKit copy. 
> +                                            // Mostly due to the Header Map and URL.
> +           896 +                            // CFNetwork overhead from reading and parsing the headers.
> +           256 * 2 +                        // Overhead from ResourceRequest, doubled to account for WebCore copy and WebKit copy. 
> +                                            // Mostly due to the URL and Header Map.
> +           148);                            // Round up to 4.0KB.

As discussed, let's at least programatically get the sizes for the URL and the ResourceResponse and only apply a multiplier as a last resort.  Also, have you filed bugs on the 2 copies of the ResourceResponse and ResourceRequest (WebCore/WebKit)?  If so, please put a note in the source with the bug number.
Comment 4 Stephanie Lewis 2008-12-10 22:15:31 PST
Created attachment 25939 [details]
new patch that calculates some sizes
Comment 5 Geoffrey Garen 2008-12-11 16:59:39 PST
Comment on attachment 25939 [details]
new patch that calculates some sizes

> -    int delta = resource->size();
> +    int delta = resource->size() + resource->overheadSize();

I think it would be better for size() to automatically add in overheadSize(), just like it adds in decodedSize(). Then, all these client calls can stay the same.

> +    // The object must now be moved to a different queue, since its size may have changed.
> +    // We have to remove explicitly before updating the size, so that we find the correct previous
> +    // queue.
> +    int delta = sizeof(CachedResourceClient*)*m_clients.capacity();

A bunch of sites like this one need a space around the "*".

Because the overhead calculation is an estimate anyway, and it relies on a bunch of assumptions, like the fact that we keep around a ResourceRequest for every resource, and two copies, and the average size of ResourceRequest, this seems like a lot of complexity just to come up with a paradoxically exact estimate. I think it would be a lot simpler, and still solve our bug, if we went with your original approach of a 4K estimate across the board.
Comment 6 Geoffrey Garen 2008-12-11 17:00:56 PST
... If we make substantial improvements to our resource metadata overhead, we can lower the estimate based on testing.
Comment 7 Dimitri Glazkov (Google) 2008-12-18 09:21:01 PST
I am curious: where do we hold on to the ResourceRequest instance? It's accounted for, but I can't seem to find where it's being held.
Comment 8 Geoffrey Garen 2008-12-18 09:34:29 PST
> I am curious: where do we hold on to the ResourceRequest instance? It's
> accounted for, but I can't seem to find where it's being held.

Stephanie is researching that. It may be a leak.
Comment 9 Dimitri Glazkov (Google) 2008-12-18 09:53:58 PST
I am also working on a "death-by-papercuts" test that consistently reproduces this issue (not sure if it'll work, but will mention anyway).

By the way, we don't need to account for an extra ResourceResponse copy in Chromium.
Comment 10 Dimitri Glazkov (Google) 2008-12-18 10:52:18 PST
Created attachment 26124 [details]
Death by a Million Papercuts Test

Here's the test I came up with. I uploaded it as a zip, as I couldn't find a good spot for it in the tree.

Essentially, it injects a batch of tiny image elements (spacer gif) with unique URLs into the document, then overwrites them with another batch and so on. And so on. And so on.

Based on my quick look, the ToT WebKit's memory usage quickly gets out of control (viewed on Intel Mac only). Same problem with Chromium.

Firefox just gives up on trying to loading the images after a while.

Surprisingly, IE7 is the one that handles the test the best, with minimal memory growth.

Comment 11 Dimitri Glazkov (Google) 2008-12-18 12:39:58 PST
Created attachment 26127 [details]
Death by a Million Papercuts Test v2

I updated the test to eliminate potential side-effects of using innerHTML by only creating one set of image elements and then updating their src with unique URLs.

The test still demonstrates the problem, which is good. Or bad? I can never tell.
Comment 12 Stephanie Lewis 2008-12-18 20:16:50 PST
So I looked at the what this test was consuming memory wise.  As expected the biggest problem was the number and size of resource requests.  If you run with either the debug menu or the developer menu open the Web Inspector creates a copy of each of the requests and as more and more resources are requested this becomes really large very quickly.  

Even if you don't have the inspector overhead this test still shows unbounded growth, again because of the resource requests.  According to Anders the lifetime of a Resource Request is tied to the ResourceLoader and is likely to stick around as long as the document sticks around.  If Anders is correct, then even though the cache is correctly throwing away resources, the requests are still around because the document is, and hence the growth.  

The good news is that if you close the window and empty the cache all of the Resource Request objects go away, so they are not leaking.  The bad news is that (contrary to previous belief) a dead cached resource should not be keeping the resource request.  We need to figure out what is keeping those objects around.  

The biggest two pieces of the Resource Request being kept around is from WebCore::KURL::operator NSURL*() const and -[NSURLRequest initWithURL:] so it seems like something in CF, Webkit, or Safari is keeping the object around
Comment 13 Dimitri Glazkov (Google) 2008-12-19 13:10:44 PST
Created attachment 26153 [details]
Death by a Million Papercuts Test v3

Sorry for spamming this bug. I updated the test a little bit.

This time, I removed any possible side-effects of loading being interrupted by another batch: all images must report back with "load" event fired before new batch of cuts is made.

The problem is still present, though it now takes somewhat longer to get there.
Comment 14 Stephanie Lewis 2008-12-22 14:39:37 PST
Closing since original issue has been resolved.  
See bug 22929 for more discussion of the ResourceRequest issue