Bug 10773 - Memory usage grows when reloading google.com/ig
Summary: Memory usage grows when reloading google.com/ig
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 420+
Hardware: All OS X 10.4
: P1 Major
Assignee: Nobody
URL: http://www.google.com/ig
Keywords: HasReduction, InRadar
Depends on:
Blocks:
 
Reported: 2006-09-07 12:16 PDT by Sornalatha Rathnasamy
Modified: 2007-01-21 13:10 PST (History)
7 users (show)

See Also:


Attachments
JS object leak test case (932 bytes, text/html)
2007-01-14 21:32 PST, Sanjay Madhav (chmmravatar)
no flags Details
Further reduction (143 bytes, text/html)
2007-01-16 05:35 PST, mitz
no flags Details
Possible fix (905 bytes, patch)
2007-01-16 08:39 PST, mitz
no flags Details | Formatted Diff | Diff
potential solution (2.68 KB, patch)
2007-01-17 03:11 PST, Sanjay Madhav (chmmravatar)
no flags Details | Formatted Diff | Diff
A better fix? (4.51 KB, patch)
2007-01-19 04:13 PST, Sanjay Madhav (chmmravatar)
no flags Details | Formatted Diff | Diff
clean way to fix it (9.55 KB, patch)
2007-01-20 01:37 PST, Sanjay Madhav (chmmravatar)
no flags Details | Formatted Diff | Diff
clean way w/ very minor style changes (8.85 KB, patch)
2007-01-20 02:29 PST, Sanjay Madhav (chmmravatar)
no flags Details | Formatted Diff | Diff
clean fix w/ changes + test case (15.24 KB, patch)
2007-01-20 20:12 PST, Sanjay Madhav (chmmravatar)
no flags Details | Formatted Diff | Diff
fixed changelog whitespace from last patch (15.22 KB, patch)
2007-01-20 20:24 PST, Sanjay Madhav (chmmravatar)
darin: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Sornalatha Rathnasamy 2006-09-07 12:16:13 PDT
1) Load www.google.com/ig
2) reload the page. 

Result: Memory leak of about 4 MB happens for each reload. Same memory leak happens when a link in top stories is visited and back to the main page.
Comment 1 Alexey Proskuryakov 2006-09-08 13:41:21 PDT
I'm seeing that the RAM size taken by Safari+TOT increases after each reload, but couldn't find an actual leak yet. Could you please describe the steps you perform in maximum detail? Do you close the window and/or reset the caches? Also, do you use a downloaded nightly, or a local WebKit build (if the latter, is it debug or release)?
Comment 2 Mark Rowe (bdash) 2006-11-05 19:22:33 PST
I cannot reproduce any leak by following the instructions provided.  If you are still able to reproduce this, I would appreciate if you could reopen this bug report and provide the information that Alexey asked for.
Comment 3 Krishna 2006-11-29 12:20:19 PST
(In reply to comment #1)
> I'm seeing that the RAM size taken by Safari+TOT increases after each reload,
> but couldn't find an actual leak yet. Could you please describe the steps you
> perform in maximum detail? Do you close the window and/or reset the caches?
> Also, do you use a downloaded nightly, or a local WebKit build (if the latter,
> is it debug or release)?

Yes, RAM size increases for each reload. The following are the RAM sizes in MBs
21.88 - when launching Webkit (Safari)
31.18 - when loading www.google.com/ig
39.00 - first reload
42.68 - second reload
49.00 - third reload

The webkit build is taken from http://nightly.webkit.org
Webkit nightly build dated Nov 29
Safari Version 2.0.3 (417.9.2)

Reproduction step:
Launch Activity Monitor to see the RAM size
Launch the Webkit icon (Safari browser)
Load www.google.com/ig
Reload the page as many times as needed

We are not sure whether this is a leak but webkit is consuming more RAM when reloading the page everytime and hence performance would be deteriorated.
Comment 4 Sanjay Madhav (chmmravatar) 2007-01-13 03:14:48 PST
After researching this, I have some further information on the cause.

1.) If you are logged in to your google account, go ahead and remove everything from your home page other than the Calendar. Note that the bug still occurs. So that narrows it down somewhat.

2.) It certainly is an issue with JS objects being referenced when they no longer should. Several thousand JS objects are NOT getting GC'ed on the refresh, which is most likely where all the increased memory usage is coming from.

So the next step will be to narrow down the problem even further.
Comment 5 Sanjay Madhav (chmmravatar) 2007-01-13 18:25:59 PST
FWIW, this same JS leak issue appears to occur with gmail as well, but on an even greater magnitude. Oddly enough though, if you go to the actual google calendar website the issue does not appear to occur. So the calendar widget and gmail have something that they're both doing in common.
Comment 6 Alexey Proskuryakov 2007-01-14 03:11:51 PST
On each reload, there is one more Document and three more HTMLDocuments in the heap (after garbage collection, of course).
Comment 7 Alexey Proskuryakov 2007-01-14 05:17:22 PST
Actually, that's:
- a Document prototype;
- a WebCore::HTMLDocument prototype;
- a KJS::HTMLDocument prototype;
- and one actual HTMLDocument.
Comment 8 Sanjay Madhav (chmmravatar) 2007-01-14 17:51:05 PST
After pouring over their javascript, I was able to figure out specifically which line was causing the problem. It appears to be the one where they attach an event listener onto document for the "keydown" event. I'm going to now attempt to reproduce the issue in a much smaller script file.
Comment 9 Sanjay Madhav (chmmravatar) 2007-01-14 21:32:39 PST
Created attachment 12439 [details]
JS object leak test case

I've attached a test case which demonstrates this leak on the most basic level. It seems like there are a few key things that need to happen in order to cause the leak:

1. Declare a function-object.
2. In the function object, declare a handler which does something with the document object.
3. Allocate further memory in the function object to make the leak detectable.
4. Set a variable equal to document, and then run addEventListener on that variable, attaching the handler to "keydown".
5. Then create a new instance of your function object.

The attached html file will leak approximately 4mb per refresh (or 43 JS objects if you're monitoring the caches window.
Comment 10 Sanjay Madhav (chmmravatar) 2007-01-15 01:37:23 PST
After further investigation, one way to get the leak to stop happening is by moving the call to forgetAllDOMNodesForDocument from Document::~Document() to Document::detach(). However, this is not a real solution since documents can potentially reattach themselves. Also, this change only affects the test case and the www.google.com/ig case, but www.gmail.com still leaks memory. But it does give some further insight as to what is happening.

Basically, in the ig leak case, the JS engine is holding a reference to the Document object. Since that reference is held, the Document destructor is never called, so the Document holds a reference to all the other DOM objects in the page. So the leak is the Document + all of its child objects still being held by native code and as a result unable to be freed.

So the key to fixing this will be figuring out why the JS engine is holding onto the Document object in this particular case, whereas in other similar cases it correctly releases it. Once we can get the JS engine to relinquish its reference to the Document, that will allow ~Document() to be called, which will clear out all the other objects.

As some added notes, bdash was able to reduce the test case further here: http://paste.lisp.org/display/35330

And othermaciej had this to add in IRC about the conditions in which the reference to Document is being held:
"I would guess you have to assign the variable holding the document or the document itself or something with the document in scope to the prototype"
Comment 11 mitz 2007-01-16 05:35:18 PST
Created attachment 12482 [details]
Further reduction

This demonstrates the root cause of the leak: a wrapper holding a reference to the document (in previous reductions the reference was very indirect, going through the __defineSetter__ property of the Object prototype and then the Function prototype). This completes a ref cycle.
Comment 12 mitz 2007-01-16 06:48:41 PST
An ugly fix would be to add JSCell::setMarked() and use it in ScriptInterpreter::mark() to prevent nodes from marking their document (if the document is not marked, set its marked bit and remember this; mark nodes for the document; if the document was not marked to begin with, reset its marked bit).
Comment 13 mitz 2007-01-16 08:39:11 PST
Created attachment 12488 [details]
Possible fix

Fixes the leak. No test regressions.
Comment 14 mitz 2007-01-16 08:53:08 PST
Comment on attachment 12488 [details]
Possible fix

Bad idea.
Comment 15 mitz 2007-01-16 09:41:18 PST
(In reply to comment #12)
> An ugly fix would be to add JSCell::setMarked() and use it in
> ScriptInterpreter::mark() to prevent nodes from marking their document (if the
> document is not marked, set its marked bit and remember this; mark nodes for
> the document; if the document was not marked to begin with, reset its marked
> bit).
> 

Also a bad idea!
Comment 16 Sanjay Madhav (chmmravatar) 2007-01-16 09:49:28 PST
I made some mark graphs which should help out those people who can't visualize what's happening (like I couldn't!)

http://paste.lisp.org/display/35471
Comment 17 Sanjay Madhav (chmmravatar) 2007-01-16 11:49:57 PST
At work so I can't ask this in IRC, but I was thinking...(bear with me here as I'm explaining this to myself while typing it out)...

We're kind of looking at it in terms of "let's prevent the Document from getting marked."

But logically thinking, wouldn't it make sense that if an HTMLBodyElement or HTMLHeadElement WAS marked, then the document that owns it has to be marked as well, because that element is contained by the document?

So the issue really is figuring out why the HTMLBodyElement or HTMLHeadElement is getting marked, and prevent that from happening.

Browsing the code some more, I'm curious (though I can't really test it until I get home) whether the issue is maybe something like this:

1. You hit refresh, and a new HTMLDocument is created.
2. The new HTMLDocument adopts the previous document's HTMLBodyElement or HTMLHeadElement, for whatever reason (it maybe needs to be kept alive temporarily on the refresh to make sure onunload handlers get their chance to run).
3. However, even though the old HTMLBodyElement or old HTMLHeadElement is now adopted by the new HTMLDocument, it STILL has a reference to the old HTMLDocument in it's node tree.
4. So now when the old HTMLBodyElement/HeadElement gets marked, even though it's now owned by the new HTMLDocument, it marks the old HTMLDocument because it still has a reference to it in its tree.
5. Now that the old HTMLDocument is marked, its destructor can never be called. This is important because it means forgetAllDOMNodesForDocument never gets called, which then leaks the old HTMLDocument + any DOM Nodes that weren't adopted by the new HTMLDocument
6. Then the vicious cycle continues for every subsequent refresh.


This might also explain why in the case of bdash's simplification without the window event handlers, if you never press a key and just click the refresh button, you don't leak. The reason for this is since the "keydown" listener isn't being processesed, the old HTMLHead doesn't need to be kept alive. The "keydown" listener is accessing the HTMLHead, so if a key is being pressed (which happens when you Cmd-R) when you refresh, the HTMLHead gets adopted by the new document, and causing the pattern as described above.

So then the "fix" if this is what's happening is to make sure that in updateDOMNodeDocument, it traverses through the tree and breaks off any links to the old HTMLDocument.

*Note the above comments were from information not actually garnered while debugging, so it's certainly feasible that I'm totally off-base here!
Comment 18 Sanjay Madhav (chmmravatar) 2007-01-16 11:52:40 PST
Please ignore everything I wrote prior to the "Browsing the code" line, I typed up that message over the course of a long time going back and forth, that I didn't realize the first few sentences completely contradicted what I talk about later on :P.

(In reply to comment #17)
> Browsing the code some more, I'm curious (though I can't really test it until I
> get home) whether the issue is maybe something like this:
> 
> 1. You hit refresh, and a new HTMLDocument is created.
> 2. The new HTMLDocument adopts the previous document's HTMLBodyElement or
> HTMLHeadElement, for whatever reason (it maybe needs to be kept alive
> temporarily on the refresh to make sure onunload handlers get their chance to
> run).
> 3. However, even though the old HTMLBodyElement or old HTMLHeadElement is now
> adopted by the new HTMLDocument, it STILL has a reference to the old
> HTMLDocument in it's node tree.
> 4. So now when the old HTMLBodyElement/HeadElement gets marked, even though
> it's now owned by the new HTMLDocument, it marks the old HTMLDocument because
> it still has a reference to it in its tree.
> 5. Now that the old HTMLDocument is marked, its destructor can never be called.
> This is important because it means forgetAllDOMNodesForDocument never gets
> called, which then leaks the old HTMLDocument + any DOM Nodes that weren't
> adopted by the new HTMLDocument
> 6. Then the vicious cycle continues for every subsequent refresh.
> 
> 
> This might also explain why in the case of bdash's simplification without the
> window event handlers, if you never press a key and just click the refresh
> button, you don't leak. The reason for this is since the "keydown" listener
> isn't being processesed, the old HTMLHead doesn't need to be kept alive. The
> "keydown" listener is accessing the HTMLHead, so if a key is being pressed
> (which happens when you Cmd-R) when you refresh, the HTMLHead gets adopted by
> the new document, and causing the pattern as described above.
> 
> So then the "fix" if this is what's happening is to make sure that in
> updateDOMNodeDocument, it traverses through the tree and breaks off any links
> to the old HTMLDocument.
> 
> *Note the above comments were from information not actually garnered while
> debugging, so it's certainly feasible that I'm totally off-base here!
> 

Comment 19 Mark Rowe (bdash) 2007-01-16 18:51:10 PST
<rdar://problem/4928583>
Comment 20 Geoffrey Garen 2007-01-16 23:57:32 PST
Maciej suggested what I think is a good solution: Make the document, rather than the ScriptInterpreter, responsible for marking all of its nodes' wrappers. Once the document becomes unreachable, the wrappers will be collected.
Comment 21 Sanjay Madhav (chmmravatar) 2007-01-17 03:11:33 PST
Created attachment 12501 [details]
potential solution

I'm not really submitting this for review at this point because I haven't done sufficient testing (and there's also a good chance that the check-in is not good). But preliminary testing seems to indicate that it doesn't break existing functionality, and it certainly does fix the www.google.com/ig memory leak. However, FWIW gmail still leaks a lot of objects, so it's quite clearly due to another issue. So even after we close this bug out we may need to open a new one for gmail or something.

Now the only real potential downfall I could see is if it's a possibility through normal execution that the Window.m_frame loses its document and then gets it back. If that can happen, then with this change the document will get garbage collected in between these two points.
Comment 22 Sanjay Madhav (chmmravatar) 2007-01-17 03:40:12 PST
Comment on attachment 12501 [details]
potential solution

Yeah so I was right about this being bad :). This method doesn't take into account documents which may have been created but never actually attached to a window (such as due to an XMLHttpRequest).
Comment 23 Sanjay Madhav (chmmravatar) 2007-01-19 04:13:00 PST
Created attachment 12554 [details]
A better fix?

I'm submitting this as hopefully a solid way to fix the problem. I haven't ran the test suite against it yet, but at least in fairly extensive browsing of many different pages which were JS-intensive, nothing appeared to broke, and it most definitely fixed this particular kind of memory leak (though it still has not fixed gmail). If the method I use is agreeable then I'll of course run the automated tests and fix any regression/speed issues if they pop up, but I honestly can't imagine it being noticably more costly than what's currently in there.

The basic premise of my patch is to store off pointers to a Document, JSDocument pair whenever a JSDocument is created in toJS(). Once those are stored off, in ScriptInterpreter::mark rather than doing what it currently does with the DOM wrappers, it iterates through the cache, and checks the refcount of the Document to determine whether the JSDocument and any DOM wrappers associated with it need to be marked.

My understanding of the Document refcount is as such (please correct me if I'm wrong):

1. If a Document doesn't have a JSDocument associated with it (which only happens in the case of no JS whatsoever in the file), it means its refcount will be 1 while it's being used, and then it hits 0 and takes care off itself. But, since no JSDocument is created, the pair won't be added to the list of Documents to check during mark, so it's behavior will be no different than it is currently.

2. If a Document w/ a JSDocument has a refcount greater than 1, it means it's currently in use, and as such any DOM wrappers associated with that particular document need to be marked.

3. If a Document w/ a JSDocument has a refcount of 1, it means that the ONLY thing left pointing to it is its JSDocument. In this case, don't mark any of the DOM wrappers associated with that document. When the collector then gets to its sweep phase, all the DOM wrappers for this document will be swept, including its JSDocument. When the JSDocument gets swept, Document loses its last refcount, and then gets deleted itself.


Now, I know that the more popular solution seems to be to make it such that JSDocument::mark handles checking whether or not it marks itself, and then if it does, have it mark all the wrappers associated with it. The logic of my patch is very similar (eg., the document ultimately decides who's getting marked), but I think my implementation has less risk involved with it. It seemed to me that in order for the JSDocument::mark approach to work, a few things would be necessary:

1. First, regardless of whether or not you do it this way, you are still going to need to keep track of all the JSDocuments that currently exist somehow, since nothing seems to do this currently. Now it's true that the wrapper hash map does get Document,JSDocument pairs inserted into it, but it has every other DOM wrapper inserted into it as well, which means it'd be terribly inefficient for the purpose of tracking which JSDocuments exist.

2. But even then, how does JSDocument itself know whether or not it shouldn't mark itself? It's going to coexist with the Document object, and it's always going to be referenced by some other node in the scenario which causes this memory leak. So it seems like the only way it'd be able to make that decision would be through it's Document, in which case assuming that my logic was correct, the refCount way would be the quickest to know whether or not native code still needs the document or not.

3. The third, and perhaps biggest issue as I saw it was that currently, when you have a live JSDocument and run mark on it, it just marks itself and the JSDocumentPrototype JSObjects. It does not have any links between itself and any of the elements in the actual file, so in order to get this to work the interpreter's behavior would have to be changed such that when it starts up on a new document it creates its new JSDocument as soon as it can, and the either store off the pointer to it, or passes it around a lot. Then you'd have to change quite a few entry points where new JSObjects are created, and make sure that if they're part of the DOM hierarchy, that they get added into the JSDocument tree structure, or however it is that the JSObjects are linked together such that if JSDocument was to mark itself, there would be an easy tree to traverse through in order to make sure all its children were marked as well. Now it could be that this part of the code is still a little over my head, but from looking at it, it seemed to me that the solution I implemented would just work very similar to what was desired, and additionally be much less risky than changing the whole JSObject hierarchy.

*Please note that I was rather tired when I wrote this, so if parts don't make sense I'll clarify tomorrow.
Comment 24 Sanjay Madhav (chmmravatar) 2007-01-20 01:37:59 PST
Created attachment 12567 [details]
clean way to fix it

Ok, after talking to othermaciej some more in IRC, I finally understood how JSDocument::mark would only be called if the Document is actually still in use. So this patch implements a custom mark for JSDocument which marks itself, and then marks all the DOM wrappers associated with its Document.

I've tested the attached patch it on many JS-intensive websites and it does not appear to break anything. Additionally, I ran the
Comment 25 Sanjay Madhav (chmmravatar) 2007-01-20 01:42:53 PST
(In reply to comment #24)
Er, oops, I cut it off accidentally. It should say "Additionally, I also ran the javascript-core tests, and 0 errors or regressions were reported."
Comment 26 Sanjay Madhav (chmmravatar) 2007-01-20 02:29:23 PST
Created attachment 12568 [details]
clean way w/ very minor style changes

Just very minor changes from the last patch, from mitzpettel's comments. Also, I ran run-webkit-tests, and there did not appear to be any that were failing as a result of my patch.
Comment 27 Darin Adler 2007-01-20 08:14:39 PST
Comment on attachment 12568 [details]
clean way w/ very minor style changes

+    //we need to mark any DOM wrappers associated with this document

Normally we put spaces after the "//" for comments. Also, I'm not sure this comment is good. It just repeats what the function name says. If the issue is clarifying that these are "wrappers" then I think the ScriptInterpreter function should be named markDOMWrappersForDocument. And lets omit the comment.

         NodeMap::iterator nodeEnd = nodeDict->end();
+        
         for (NodeMap::iterator nodeIt = nodeDict->begin(); nodeIt != nodeEnd; ++nodeIt) {

I don't think that adding this blank line improves paragraphing/formatting.

Is there any way to test this in a regression test?

r=me
Comment 28 Sanjay Madhav (chmmravatar) 2007-01-20 20:12:36 PST
Created attachment 12577 [details]
clean fix w/ changes + test case

I removed the comment and extra newline from the code as suggested.

Additionally, I added support for a test function to grab the current JS object count, and constructed a test case which checks to see if the JS objects leak after refreshing an iframe which contains Mitz's "ultra" reduction for this bug.
Comment 29 Sanjay Madhav (chmmravatar) 2007-01-20 20:24:14 PST
Created attachment 12578 [details]
fixed changelog whitespace from last patch
Comment 30 Darin Adler 2007-01-21 08:21:35 PST
Comment on attachment 12578 [details]
fixed changelog whitespace from last patch

r=me
Comment 31 Sam Weinig 2007-01-21 13:10:07 PST
Landed in r19013.