Bug 61894 - DocumentLoader keeps a reference to all URL strings ever loaded in m_resourcesClientKnowsAbout leading to lots of memory waste
Summary: DocumentLoader keeps a reference to all URL strings ever loaded in m_resource...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: James Robinson
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-06-01 17:14 PDT by James Robinson
Modified: 2011-07-22 13:18 PDT (History)
12 users (show)

See Also:


Attachments
Patch (3.39 KB, patch)
2011-06-01 20:02 PDT, James Robinson
no flags Details | Formatted Diff | Diff
guards entire functions + callsites (5.90 KB, patch)
2011-06-01 22:26 PDT, James Robinson
no flags Details | Formatted Diff | Diff
generate large images via data urls (2.63 KB, text/html)
2011-07-18 12:26 PDT, Scott Graham
no flags Details
exclude data urls from m_resourcesClientKnowsAbout (2.24 KB, patch)
2011-07-18 12:33 PDT, Scott Graham
no flags Details | Formatted Diff | Diff
exclude data urls from m_resourcesClientKnowsAbout (except on MAC pending bug 61904) (2.21 KB, patch)
2011-07-22 08:32 PDT, Scott Graham
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description James Robinson 2011-06-01 17:14:22 PDT
DocumentLoader keeps a reference to all URL strings ever loaded leading to lots of memory waste
Comment 1 James Robinson 2011-06-01 17:48:09 PDT
This set is used in FrameLoader::loadedResourceFromMemoryCache() to control whether the client notifications are sent for loads that go through this codepath.  Possible fixes for the memory waste are:

1.) Preserve current behavior exactly and use a more compact data structure to determine whether a given URL is in the set of URLs the client knows about, for example a HashSet of MD5 hashes.
2.) Decide that we don't need the current behavior for data: URLs and exclude them from the set since these URLs are generally large and contribute more to WebKit's excessive memory use than other types of URLs.

The client call this set effects is dispatchDidLoadResourceFromMemoryCache, which is notImplemented() in every in-tree port except for the Mac port (not WebKit2) and Chromium.
Comment 2 James Robinson 2011-06-01 20:02:29 PDT
Created attachment 95708 [details]
Patch
Comment 3 James Robinson 2011-06-01 20:03:42 PDT
Patch posted for consideration.  If someone more familiar with the details of the mac port's use of this client API can suggest a better solution for that port that would be excellent, but in the meantime I think it's a clear win to fix this leak for everyone else.
Comment 4 Eric Seidel (no email) 2011-06-01 21:15:34 PDT
Comment on attachment 95708 [details]
Patch

LGTM.  You should file a separate bug for the mac port, or possibly repurpose this one.
Comment 5 Eric Seidel (no email) 2011-06-01 21:15:52 PDT
Bradee has historically worked on the mac port's loader.
Comment 6 James Robinson 2011-06-01 22:04:55 PDT
Filed https://bugs.webkit.org/show_bug.cgi?id=61904 for Mac.
Comment 7 Brady Eidson 2011-06-01 22:11:12 PDT
Comment on attachment 95708 [details]
Patch

Especially since we don't even need it in WK2, we should be able to get away from this in WK1 too.

But clearly we can't simply remove this concept until someone explores the Mac port's usage of it.

I'd suggest a better patch for now that would be a greater win for WebCore and all ports as a whole, in readability and prepping for future removal of this code:
- File bugzilla for the Mac port exploration as Eric suggested
- In DocumentLoader.h, put didTellClientAboutLoad, haveToldClientAboutLoad, and m_resourcesClientKnowsAbout in a single #if PLATFORM(MAC) block.  Note that this means for non-Mac platforms, they will be compiled out and not exist.
- Put an #if PLATFORM(MAC) block at the call sites of the two methods.  Without looking at code, I bet this will only be a few places. (you probably already know)
- Take these 3 or 4 #if PLATFORM(MAC) sites with a FIXME that includes the bugzilla you filed.
Comment 8 Brady Eidson 2011-06-01 22:13:28 PDT
Heh.  Apparently I can't clear the commit-queue flag even though I r-'ed, and now I can't even change the review or commit-queue flags.

We'll see if the commit-queue lands an r-'ed patch  :)

(BTW, I did think it was worth an r- because of how much fracture this adds to DocumentLoader.h)
Comment 9 James Robinson 2011-06-01 22:16:49 PDT
(In reply to comment #7)
> (From update of attachment 95708 [details])
> Especially since we don't even need it in WK2, we should be able to get away from this in WK1 too.
> 
> But clearly we can't simply remove this concept until someone explores the Mac port's usage of it.

I've explored it to the extent that I'm able, but all I can tell is that it's not needed for anything except for Mac WK1 and the Mac WK1 usage is black magic to me.


> 
> I'd suggest a better patch for now that would be a greater win for WebCore and all ports as a whole, in readability and prepping for future removal of this code:
> - File bugzilla for the Mac port exploration as Eric suggested

Done - https://bugs.webkit.org/show_bug.cgi?id=61904

> - In DocumentLoader.h, put didTellClientAboutLoad, haveToldClientAboutLoad, and m_resourcesClientKnowsAbout in a single #if PLATFORM(MAC) block.  Note that this means for non-Mac platforms, they will be compiled out and not exist.
> - Put an #if PLATFORM(MAC) block at the call sites of the two methods.  Without looking at code, I bet this will only be a few places. (you probably already know)
> - Take these 3 or 4 #if PLATFORM(MAC) sites with a FIXME that includes the bugzilla you filed.

These sound nice but the resolution depends on how https://bugs.webkit.org/show_bug.cgi?id=61904 is resolved, which isn't something I can't really handle personally.  I hope that we can just delete this all.  In the meantime, however, it seems unfair to penalize all of the other ports (and WK2) with this.
Comment 10 James Robinson 2011-06-01 22:17:08 PDT
Comment on attachment 95708 [details]
Patch

clearin' cq
Comment 11 James Robinson 2011-06-01 22:26:57 PDT
Created attachment 95722 [details]
guards entire functions + callsites
Comment 12 James Robinson 2011-06-01 22:27:41 PDT
Added ifdefs to the callsites as suggested.  It's more #ifdef-age, but I agree that it's a bit cleaner.
Comment 13 Brady Eidson 2011-06-01 22:43:22 PDT
(In reply to comment #9)
> (In reply to comment #7)
> > (From update of attachment 95708 [details] [details])
> > Especially since we don't even need it in WK2, we should be able to get away from this in WK1 too.
> > 
> > But clearly we can't simply remove this concept until someone explores the Mac port's usage of it.
> 
> I've explored it to the extent that I'm able, but all I can tell is that it's not needed for anything except for Mac WK1 and the Mac WK1 usage is black magic to me.

I meant "until someone familiar with it explores the Mac port's usage", not you  :)


> 
> > - In DocumentLoader.h, put didTellClientAboutLoad, haveToldClientAboutLoad, and m_resourcesClientKnowsAbout in a single #if PLATFORM(MAC) block.  Note that this means for non-Mac platforms, they will be compiled out and not exist.
> > - Put an #if PLATFORM(MAC) block at the call sites of the two methods.  Without looking at code, I bet this will only be a few places. (you probably already know)
> > - Take these 3 or 4 #if PLATFORM(MAC) sites with a FIXME that includes the bugzilla you filed.
> 
> These sound nice but the resolution depends on how https://bugs.webkit.org/show_bug.cgi?id=61904 is resolved, which isn't something I can't really handle personally.  I hope that we can just delete this all.  In the meantime, however, it seems unfair to penalize all of the other ports (and WK2) with this.

I think you misunderstood me here, too.  I am fully in support of cleanup now that removes the penalty for all the other ports.  I, too, hope we can just delete this all.

The patch I suggested moves all of the code that needs to be removed to exactly one place in DocumentLoader.h, and tags the call sites that can be simplified later with the bug # that will simplify them.
Comment 14 Brady Eidson 2011-06-01 22:45:48 PDT
Comment on attachment 95722 [details]
guards entire functions + callsites

I guess this is simply enough that we don't really need to tag the callsites with the bug #  :)  Thanks for refining it (I hate branched #ifdefs!)
Comment 15 WebKit Commit Bot 2011-06-02 05:52:58 PDT
Comment on attachment 95722 [details]
guards entire functions + callsites

Clearing flags on attachment: 95722

Committed r87901: <http://trac.webkit.org/changeset/87901>
Comment 16 WebKit Commit Bot 2011-06-02 05:53:04 PDT
All reviewed patches have been landed.  Closing bug.
Comment 17 James Robinson 2011-06-02 20:12:59 PDT
Reverted r87901 for reason:

Might

Committed r87990: <http://trac.webkit.org/changeset/87990>
Comment 18 Scott Graham 2011-07-18 12:26:11 PDT
Created attachment 101188 [details]
generate large images via data urls

also at www/~scottmg/imgmemory.html inside Google
Comment 19 Scott Graham 2011-07-18 12:33:44 PDT
Created attachment 101189 [details]
exclude data urls from m_resourcesClientKnowsAbout
Comment 20 Scott Graham 2011-07-18 12:37:37 PDT
Attached test doesn't leak any longer. There's still a pretty high high-watermark due to image cache, but can run indefinitely now.
Comment 21 Scott Graham 2011-07-18 16:36:03 PDT
Oops, should have added discussion to comments.

Previous attempt at a fix by removing the cache caused a performance regression of around 3% in Chromium page_cycler tests. New patch instead simply excludes data urls from the cache. This takes care of the majority of unwanted memory usage, while maintaining performance characteristics.
Comment 22 Scott Graham 2011-07-22 08:32:32 PDT
Created attachment 101735 [details]
exclude data urls from m_resourcesClientKnowsAbout (except on MAC pending bug 61904)
Comment 23 Scott Graham 2011-07-22 08:34:51 PDT
Same as previous patch, but exclude PLATFORM(MAC) until someone has time to look at https://bugs.webkit.org/show_bug.cgi?id=61904

Now only affects Chromium based on previous discussion.
Comment 24 James Robinson 2011-07-22 12:34:32 PDT
Comment on attachment 101735 [details]
exclude data urls from m_resourcesClientKnowsAbout (except on MAC pending bug 61904)

R=me
Comment 25 WebKit Review Bot 2011-07-22 13:17:57 PDT
Comment on attachment 101735 [details]
exclude data urls from m_resourcesClientKnowsAbout (except on MAC pending bug 61904)

Clearing flags on attachment: 101735

Committed r91600: <http://trac.webkit.org/changeset/91600>
Comment 26 WebKit Review Bot 2011-07-22 13:18:03 PDT
All reviewed patches have been landed.  Closing bug.