RESOLVED FIXED Bug 105367
[chromium] Creation of dedicated workers (isolates) leaks reference to HTMLDocument
https://bugs.webkit.org/show_bug.cgi?id=105367
Summary [chromium] Creation of dedicated workers (isolates) leaks reference to HTMLDo...
Brandon Jones
Reported 2012-12-18 16:32:04 PST
It appears that some C++ objects associated with Javascript wrappers are not being properly destroyed if a Worker was created during page execution. We have primarily noticed this with WebGL contexts, which provide a convenient visualization of the issue in older versions of Chrome, but suspect that it may be happening with other objects as well. If a WebGL context and Worker object are both created on the same page, refreshing that page several times will force the WebGL context to aquire a smaller than requested backbuffer after hitting internal memory limits. On pages without workers these limits are not hit because objects are collected properly. A more concrete illustration is to add some instrumentation to the WebKit source. In WebCore/html/canvas/WebGLRenderingContext.cpp, add a logging to the WebGLRenderingContext constructor and destructor so that you can monitor when they are created and destroyed. (This modification is included as a patch in the attached zip). To see the issue using the attached tests extract them locally and make sure they are accessible from a local web server (If you don't have a local web server set up, navigate to the folder the files were extracted to and run "python -m SimpleHTTPServer". The files should now be accessible at http://localhost:8000) Viewing index.html will then start the test, which loads a simple WebGL page with a worker in an iframe and refreshes it multiple times. If the WebGL backbuffer resolution is different than the requested resolution the canvas will turn red and the test will stop. If you have added logging to the WebGLRenderingContext constructor and destructor you will see that the destructor is never called. Contrast this with the behavior of index-no-worker.html, which only differs in that the page does not create a worker and displays expected garbage collection patterns where the destructors are called in batches. In the case where the contexts are not destroyed the Chrome heap profiler indicates that there are no live WebGLRenderingContext objects other than the ones on the current instance of the page, but even a full GC will not force the destructors to be called properly.
Attachments
Test pages to provoke the described error (7.93 KB, application/zip)
2012-12-18 16:33 PST, Brandon Jones
no flags
Simplified test case (1.05 KB, application/zip)
2012-12-27 13:19 PST, Kenneth Russell
no flags
Patch used to debug problem (21.17 KB, application/octet-stream)
2012-12-27 13:22 PST, Kenneth Russell
no flags
Patch (33.81 KB, patch)
2012-12-31 09:32 PST, Kenneth Russell
no flags
Patch (33.98 KB, patch)
2013-01-02 17:08 PST, Kenneth Russell
no flags
Patch for landing (34.07 KB, patch)
2013-01-02 18:41 PST, Kenneth Russell
no flags
Brandon Jones
Comment 1 2012-12-18 16:33:05 PST
Created attachment 180056 [details] Test pages to provoke the described error
Kenneth Russell
Comment 2 2012-12-18 17:00:43 PST
+some people who worked on the DOM bindings, isolates, and GC recently Initially we thought this was a bug in V8's GC, but because the WebGLRenderingContext objects aren't alive according to the heap profiler, I think it's a bug in the DOM bindings related to isolates. I have a bad feeling that if a dedicated worker is created, it may be preventing C++ objects which are referred to by JavaScript objects (whether those objects are in the DOM or not -- the WebGLRenderingContext isn't) from being properly unreferenced and deleted. I'm raising this to P1 until the cause is at least understood, as this bug could be causing major memory leaks for applications using workers. I don't think it's specific to WebGL. Please CC: anyone who may have worked in this area.
Kenneth Russell
Comment 3 2012-12-27 12:34:45 PST
I've been investigating this and have some updates which I'll post shortly. Assigning to myself.
Kenneth Russell
Comment 4 2012-12-27 13:16:38 PST
CC'ing a few more people who may be interested.
Kenneth Russell
Comment 5 2012-12-27 13:19:50 PST
Created attachment 180817 [details] Simplified test case
Kenneth Russell
Comment 6 2012-12-27 13:22:10 PST
Created attachment 180818 [details] Patch used to debug problem
Kenneth Russell
Comment 7 2012-12-27 13:36:51 PST
The problem is that the WebWorkerClientImpl instance (Source/WebKit/chromium/src/WebWorkerClientImpl.cpp) is never deleted, leaking a reference to the HTMLDocument via m_scriptExecutionContext. This prevents the destruction of all attached DOM elements, including the Canvas, so that when the JavaScript reference to the WebGLRenderingContext is GC'd, it doesn't ultimately cause the Canvas to be deleted. Since the Canvas owns its m_context via OwnPtr, the WebGL context and all associated resources are never cleaned up. Basically, any page using Workers in Chromium leaks the entire DOM when the page is reloaded. I found that the same leak didn't exist in Safari, despite a strong hunch that the problem wasn't a leak of a reference outside Chromium's WebKit API boundary. That should have been a hint about where to look. This bug was pretty tricky to track down. I'd appreciate any suggestions about how it could have been done better or more efficiently. Basically, per the attached patch, I instrumented TreeShared::ref() and deref() to fire when operating upon HTMLDocuments, since I found from early debugging that the Canvas wasn't being deleted and that was because the Document had a nonzero refcount at the last call to unref(). However, ref/unref were called too often against the document to debug directly. I added a mechanism to suppress the firing inside ref/unref and added those suppressions throughout WebKit. That got pretty close to the bug, but it turned out I suppressed the key place (within a call to ScriptElement::executeScript) where the unmatched call to HTMLDocument::ref was being made. I only got lucky because I saw a call to unref in WorkerMessagingProxy::~WorkerMessagingProxy and didn't see the construction, so made the suppression in executeScript more precise, which exposed the unbalanced call from WebWorkerClientImpl::WebWorkerClientImpl. I don't have a fix yet but am going to try to find one, as well as add a test to ensure this leak doesn't reappear.
Kenneth Russell
Comment 8 2012-12-31 09:32:02 PST
Adam Barth
Comment 9 2013-01-02 11:02:33 PST
Interesting! Who is the best person to review this patch?
David Levin
Comment 10 2013-01-02 11:14:43 PST
(In reply to comment #9) > Interesting! Who is the best person to review this patch? I'd guess dimich@
Dmitry Titov
Comment 11 2013-01-02 14:42:29 PST
Comment on attachment 180978 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=180978&action=review r- due to "void*" comment. > Source/WebCore/workers/WorkerLoaderProxy.h:62 > + virtual void* toWebWorkerBase() = 0; Lets try to forward-declare the return type here. In general, it's unfortunate to introduce the 'embedder context' into WorkerLoaderProxy. Is there a better solution?
Kenneth Russell
Comment 12 2013-01-02 17:08:57 PST
Kenneth Russell
Comment 13 2013-01-02 17:22:12 PST
Comment on attachment 180978 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=180978&action=review >> Source/WebCore/workers/WorkerLoaderProxy.h:62 >> + virtual void* toWebWorkerBase() = 0; > > Lets try to forward-declare the return type here. > > In general, it's unfortunate to introduce the 'embedder context' into WorkerLoaderProxy. Is there a better solution? New patch uploaded which forward declares the return type, and avoids static_casts in the implementations. I gave this a lot of thought but can't come up with a better solution. The basic problem is that the class hierarchies of Chromium's dedicated and shared worker implementations now diverge significantly. It used to be the case in Chromium that every implementation of WorkerLoaderProxy that hung off the WorkerThread derived from WebWorkerBase. That is now no longer the case. For dedicated workers, WebCore's WorkerMessagingProxy is now used as the base class. WebSharedWorkerImpl continues to derive from WebWorkerBase. This means that it is no longer legal to cast between WebWorkerBase* and WorkerLoaderProxy*. It would be possible to add a static method "WebWorkerBase::fromWorkerLoaderProxy", and avoid touching WorkerLoaderProxy.h, but it would be very fragile. If any new implementation of WorkerLoaderProxy were added, the implementation of WebWorkerBase::fromWorkerLoaderProxy would have to be updated -- and there is no way to force a compilation failure to give the developer a hint that that code needs to be updated. The presence of the pure virtual in WorkerLoaderProxy provides this essential information. In sum, as poor as it is to modify WorkerLoaderProxy with a Chromium-specific forward declaration and method, I think it is the best technical solution given the constraints.
Dmitry Titov
Comment 14 2013-01-02 17:36:50 PST
Comment on attachment 181115 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=181115&action=review r+ with a note. > Source/WebKit/chromium/src/WebWorkerClientImpl.cpp:73 > // Delegates implementation of Worker{Loader,Context,Object}Proxy to WorkerMessagingProxy. This comment seems old now... thsi class does not really delegate, it is a WMP... Remove?
Kenneth Russell
Comment 15 2013-01-02 17:39:22 PST
Comment on attachment 181115 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=181115&action=review >> Source/WebKit/chromium/src/WebWorkerClientImpl.cpp:73 >> // Delegates implementation of Worker{Loader,Context,Object}Proxy to WorkerMessagingProxy. > > This comment seems old now... thsi class does not really delegate, it is a WMP... Remove? You're right. I'll update the patch.
Kenneth Russell
Comment 16 2013-01-02 18:41:54 PST
Created attachment 181129 [details] Patch for landing Fixed comment and small typo in failure case of layout test
WebKit Review Bot
Comment 17 2013-01-02 19:23:49 PST
Comment on attachment 181129 [details] Patch for landing Clearing flags on attachment: 181129 Committed r138693: <http://trac.webkit.org/changeset/138693>
WebKit Review Bot
Comment 18 2013-01-02 19:23:54 PST
All reviewed patches have been landed. Closing bug.
Note You need to log in before you can comment on or make changes to this bug.