Bug 62292

Summary: [Chromium] Worker object may be garbage collected even if it has message handlers
Product: WebKit Reporter: Yury Semikhatsky <yurys>
Component: WebCore Misc.Assignee: Yury Semikhatsky <yurys>
Status: RESOLVED FIXED    
Severity: Normal CC: ap, caseq, dimich, dslomov, levin, pfeldman
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Attachments:
Description Flags
Repro case 1
none
Repro case 1
none
Reduced test case
none
Patch
dimich: review+
Patch for landing none

Yury Semikhatsky
Reported 2011-06-08 09:34:39 PDT
Currently if you create a dedicated worker by calling the following function: function createWorker() { var worker = new Worker("worker.js"); worker.onmessage = function(event) { console.log("Message from worker: " + event.data); }; worker.postMessage("ping"); } the worker object may well be GC'ed before any message from the worker context is received. This behavior looks confusing. The spec doesn't say anything specific about this while in many examples there no references to the worker object to which onmessage/onerror handlers are attached which means that those samples may be affected by this issue. Should we protect the worker from being collected while it has message handlers? Probably this problem has already been discussed.
Attachments
Repro case 1 (651 bytes, text/html)
2011-06-08 23:32 PDT, Yury Semikhatsky
no flags
Repro case 1 (240 bytes, application/javascript)
2011-06-08 23:32 PDT, Yury Semikhatsky
no flags
Reduced test case (826 bytes, application/zip)
2011-06-09 07:04 PDT, Yury Semikhatsky
no flags
Patch (3.60 KB, patch)
2011-06-09 07:32 PDT, Yury Semikhatsky
dimich: review+
Patch for landing (3.83 KB, patch)
2011-06-10 00:34 PDT, Yury Semikhatsky
no flags
Dmitry Titov
Comment 1 2011-06-08 10:51:01 PDT
Is there a repro case for this?
Alexey Proskuryakov
Comment 2 2011-06-08 11:18:18 PDT
In your example, the worker object will be GC protected because there is an outstanding message. Once it's handled and acknowledged, and we also know that it hasn't triggered a timer or XHR inside the worker thread, it should be fine to collect the worker, because we know that it can't possibly be messaged again from either side. Having handlers always GC protect the object would make it extremely likely to leak. So, this is correct behavior. Do you have an example where the implementation doesn't match the design that I described above?
Andrey Kosyakov
Comment 3 2011-06-08 11:24:30 PDT
Doesn't fast/workers/worker-gc.html aim at testing just this? I wonder what's so different in Yury's example.
Dmitry Titov
Comment 4 2011-06-08 12:16:22 PDT
(In reply to comment #3) > Doesn't fast/workers/worker-gc.html aim at testing just this? I wonder what's so different in Yury's example. During test in worker-gc.html, the Worker always has 'pending activity' in the form of a message being sent and replied. In the Yury.s example, I would expect the "ping" message to be delivered to the worker context, and if the worker context replies with another message from its onmessage handler, it would be delivered back and appended to console log. After that, if there is no pending activity in the worker context (no timers, xhrs, database requests, file operations) - the worker becomes 'unprotected' (as spec says) in a sense that it can not fire any new events and thus can be GC'ed safely, together with its event handlers. However, if in this example the worker context replies back but the even handler is not firing because it was GC'ed, it's a bug.
Yury Semikhatsky
Comment 5 2011-06-08 23:32:27 PDT
Created attachment 96551 [details] Repro case 1
Yury Semikhatsky
Comment 6 2011-06-08 23:32:52 PDT
Created attachment 96552 [details] Repro case 1
Yury Semikhatsky
Comment 7 2011-06-08 23:50:59 PDT
(In reply to comment #1) > Is there a repro case for this? Yes, see attached worker.html and worker.js. worker.js periodically sends a message to its worker object so the worker context is not collected. However the worker object is not GC protected and I see only part of the messages printed in the page. I can reliably reproduce it on Chromium Linux after the page refresh: messages from worker #0 will never be printed.
Alexey Proskuryakov
Comment 8 2011-06-08 23:57:14 PDT
I cannot reproduce this in WebKit nightly r88132 (after changing the test to not use bind()). I forced garbage collection several times manually when running the test.
Yury Semikhatsky
Comment 9 2011-06-09 00:14:02 PDT
(In reply to comment #2) > In your example, the worker object will be GC protected because there is an outstanding message. Once it's handled and acknowledged, and we also know that it hasn't triggered a timer or XHR inside the worker thread, it should be fine to collect the worker, because we know that it can't possibly be messaged again from either side. > Does it mean that the worker object should stay alive while there is a pending activity in the corresponding worker context? > Having handlers always GC protect the object would make it extremely likely to leak. > Yeah, I understand that. As I see in current implementation tracking of pending activities in the worker context is supposed to address the issue. > So, this is correct behavior. Do you have an example where the implementation doesn't match the design that I described above? I posted my example, it looks like a bug in Chromium's implementation which appears after page refresh.
Yury Semikhatsky
Comment 10 2011-06-09 00:15:10 PDT
(In reply to comment #8) > I cannot reproduce this in WebKit nightly r88132 (after changing the test to not use bind()). I forced garbage collection several times manually when running the test. I suspect this is Chromium-specific, let me try to reproduce it on WebKit ToT.
Yury Semikhatsky
Comment 11 2011-06-09 00:38:14 PDT
(In reply to comment #10) > (In reply to comment #8) > > I cannot reproduce this in WebKit nightly r88132 (after changing the test to not use bind()). I forced garbage collection several times manually when running the test. > > I suspect this is Chromium-specific, let me try to reproduce it on WebKit ToT. Yeah, I couldn't reproduce it on WebKit r88431 Linux Qt.
Yury Semikhatsky
Comment 12 2011-06-09 07:04:35 PDT
Created attachment 96582 [details] Reduced test case The case doesn't fail in debug mode with a shorter interval in worker-pending.js as in that case message from the worker context seems to be delivered before acknowledge for the message from the worker.
Yury Semikhatsky
Comment 13 2011-06-09 07:32:15 PDT
Yury Semikhatsky
Comment 14 2011-06-09 07:37:33 PDT
(In reply to comment #13) > Created an attachment (id=96587) [details] > Patch I'd like to convert the test case into a layout test but I don't see yet how to do this so that the test isn't too slow and reliable. The problem with test is that full-GC needs to be triggered right after WebWorkerClientImpl::confirmMessageFromWorkerObject is called and there is no way to hook WebWorkerClientImpl::confirmMessageFromWorkerObject from JavaScript.
Alexey Proskuryakov
Comment 15 2011-06-09 08:48:22 PDT
Did you try forcing garbage collection to make the test faster? There is a gc() function in js-test-pre.js.
Yury Semikhatsky
Comment 16 2011-06-09 08:50:28 PDT
WebWorkerClientImpl currently inverts order of dispatching of confirmMessageFromWorkerObject and postMessageToWorkerObject. Let assume that WebWorkerClientImpl::postMessageToWorkerObject is called right after WebWorkerClientImpl::confirmMessageFromWorkerObject. WebWorkerClientImpl::confirmMessageFromWorkerObject will unconditionally schedule asynchronous task while WebWorkerClientImpl::postMessageToWorkerObject will check if current thread is the same where the client was created (which is always true for workers created in Page) and will dispatch the message synchronously i.e. before WebWorkerClientImpl::confirmMessageFromWorkerObjectTask is executed. Looks like a bug to me. Dmitry, could clarify this?
Yury Semikhatsky
Comment 17 2011-06-09 08:53:31 PDT
(In reply to comment #15) > Did you try forcing garbage collection to make the test faster? There is a gc() function in js-test-pre.js. Yes, tried it both by inserting: function triggerGC() { var head; for (var i = 0; i < 10000; i++) { var newHead = new Array(1000); newHead[0] = head; head = newHead; } } setInterval(triggerGC, 0); and function triggerGC() { var head; for (var i = 0; i < 10000; i++) { var newHead = new Array(1000); newHead[0] = head; head = newHead; } gc(); } setInterval(triggerGC, 0); with no success. If the interval between acknowledge and message from worker context is short enough, we cannot be sure that triggerGC() is invoked in between.
Dmitry Titov
Comment 18 2011-06-09 12:13:28 PDT
(In reply to comment #16) > WebWorkerClientImpl currently inverts order of dispatching of confirmMessageFromWorkerObject and postMessageToWorkerObject. > Let assume that WebWorkerClientImpl::postMessageToWorkerObject is called right after WebWorkerClientImpl::confirmMessageFromWorkerObject. WebWorkerClientImpl::confirmMessageFromWorkerObject will unconditionally schedule asynchronous task while WebWorkerClientImpl::postMessageToWorkerObject will check if current thread is the same where the client was created (which is always true for workers created in Page) and will dispatch the message synchronously i.e. before WebWorkerClientImpl::confirmMessageFromWorkerObjectTask is executed. > > Looks like a bug to me. Dmitry, could clarify this? I think there is no bug there, at least at first look. It's fine to delay confirmation, it only will keep the Worker object alive for a bit longer. However, loosing the pending activity information that you are fixing in your patch is a real bug. Very nice catch! Note that this code will soon be eliminated (dslomov@chromium.org works on moving dedicated workers back in process so they won't use Chromium WebWorker API anymore. Shared workers will still go via Chromium API, but they have different lifetime tracking. This work will also enable running dedicated workers tests in DRT for Chromium port (currently all workers layout tests are disabled in test_expectations) - so unless you want to add a ui_test for this failure, I don't see how it can be tested. So I'm going to r+ your patch w/o test.
Dmitry Titov
Comment 19 2011-06-09 12:14:45 PDT
Comment on attachment 96587 [details] Patch r=me. Please add a comment to ChangeLog on why there is no layout test, and rename the bug to have '[Chromium]' in it, including in ChangeLog.
Yury Semikhatsky
Comment 20 2011-06-10 00:34:22 PDT
Created attachment 96703 [details] Patch for landing Changelog entry was modified to include information about testing the change.
Yury Semikhatsky
Comment 21 2011-06-10 00:38:53 PDT
Yury Semikhatsky
Comment 22 2011-06-10 00:44:17 PDT
(In reply to comment #18) > (In reply to comment #16) > Note that this code will soon be eliminated (dslomov@chromium.org works on moving dedicated workers back in process so they won't use Chromium WebWorker API anymore. Is it in a usable state yet? Any ETA? Then I'd rather focus on the inspection of the new in-process dedicated workers instead of spending time on the current implementation which involves modifications in Chromium IPC.
Dmitry Lomov
Comment 23 2011-06-10 01:06:45 PDT
(In reply to comment #22) > (In reply to comment #18) > > (In reply to comment #16) > > Note that this code will soon be eliminated (dslomov@chromium.org works on moving dedicated workers back in process so they won't use Chromium WebWorker API anymore. > > Is it in a usable state yet? Any ETA? Then I'd rather focus on the inspection of the new in-process dedicated workers instead of spending time on the current implementation which involves modifications in Chromium IPC. My ETA for this is end of June. You can track progress at https://bugs.webkit.org/show_bug.cgi?id=61016
Note You need to log in before you can comment on or make changes to this bug.