Bug 114432 - [EFL][WK2] Connection is leaking on process termination
Summary: [EFL][WK2] Connection is leaking on process termination
Status: RESOLVED WONTFIX
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Rafael Brandao
URL:
Keywords:
: 122044 (view as bug list)
Depends on: 115332
Blocks:
  Show dependency treegraph
 
Reported: 2013-04-11 03:50 PDT by Rafael Brandao
Modified: 2017-03-11 10:31 PST (History)
18 users (show)

See Also:


Attachments
Patch (6.97 KB, patch)
2013-04-11 03:56 PDT, Rafael Brandao
no flags Details | Formatted Diff | Diff
Patch (2.69 KB, patch)
2013-04-11 05:02 PDT, Rafael Brandao
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from webkit-ews-14 for mac-mountainlion-wk2 (949.01 KB, application/zip)
2013-04-11 05:24 PDT, Build Bot
no flags Details
Patch for efl's work queue (2.93 KB, patch)
2013-04-11 13:47 PDT, Lauro Moura Maranhao Neto
no flags Details | Formatted Diff | Diff
Updated patch for efl's work queue (1.35 KB, patch)
2013-04-11 14:52 PDT, Lauro Moura Maranhao Neto
no flags Details | Formatted Diff | Diff
Patch (3.27 KB, patch)
2013-04-12 09:41 PDT, Rafael Brandao
no flags Details | Formatted Diff | Diff
Patch (4.52 KB, patch)
2013-04-12 19:33 PDT, Rafael Brandao
no flags Details | Formatted Diff | Diff
Patch (7.72 KB, patch)
2013-04-24 18:29 PDT, Rafael Brandao
no flags Details | Formatted Diff | Diff
Patch (7.43 KB, patch)
2013-04-25 10:23 PDT, Rafael Brandao
no flags Details | Formatted Diff | Diff
Patch (3.67 KB, patch)
2013-05-15 17:18 PDT, Rafael Brandao
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Rafael Brandao 2013-04-11 03:50:24 PDT
ConnectionUnix is leaking on process termination
Comment 1 Rafael Brandao 2013-04-11 03:56:45 PDT
Created attachment 197568 [details]
Patch
Comment 2 Rafael Brandao 2013-04-11 04:02:16 PDT
I've left a few printfs so it can be easier to understand what is going on: Connection should be killed as soon as ChildProcessProxy clears the reference to it. This would cause the work queue to be destroyed and the thread resources to be cleaned up. Since the work queue is holding references to Connection and Connection owns that work queue, it will never be destroyed.

It looks a bit like what is done on platformInvalidate after the patch to bug #108140. Since this code is very tricky, I'd welcome feedback.
Comment 3 Early Warning System Bot 2013-04-11 04:02:24 PDT
Comment on attachment 197568 [details]
Patch

Attachment 197568 [details] did not pass qt-wk2-ews (qt):
Output: http://webkit-commit-queue.appspot.com/results/17670128
Comment 4 Rafael Brandao 2013-04-11 04:04:18 PDT
I believe this also happens to Qt, but I couldn't figure out how to run WK2 API tests there. I wonder if this is avoided on Mac. Anders, what do you think?
Comment 5 Rafael Brandao 2013-04-11 05:02:41 PDT
Created attachment 197575 [details]
Patch
Comment 6 Build Bot 2013-04-11 05:24:40 PDT
Comment on attachment 197575 [details]
Patch

Attachment 197575 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-commit-queue.appspot.com/results/17641090

New failing tests:
fast/repaint/japanese-rl-selection-repaint-in-regions.html
Comment 7 Build Bot 2013-04-11 05:24:43 PDT
Created attachment 197577 [details]
Archive of layout-test-results from webkit-ews-14 for mac-mountainlion-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: webkit-ews-14  Port: mac-mountainlion-wk2  Platform: Mac OS X 10.8.2
Comment 8 Adenilson Cavalcanti Silva 2013-04-11 08:36:46 PDT
About WK2 unit tests:

a) Qt:  WebKitBuild/Release/Tools/TestWebKitAPI/Tests/WebKit2/tst_webkit2

b) Mac: ./Tools/Scripts/run-api-tests --root=./WebKitBuild/Debug/

c) EFL: each test will be a binary in 'bin', i.e test_webkit2_api_TestName

About the patch: no tests?
Comment 9 Lauro Moura Maranhao Neto 2013-04-11 13:47:55 PDT
Created attachment 197664 [details]
Patch for efl's work queue

I managed to fix the leak in EFL's WorkQueue by changing the socket handler (a Connection bind) to a OwnPtr instead of a plain member, so it can be deref'ed when unregistering the socket handler.
Comment 10 Lauro Moura Maranhao Neto 2013-04-11 14:52:51 PDT
Created attachment 197677 [details]
Updated patch for efl's work queue

Simpler efl fix by just reassiging the handler, as suggested by andersca on IRC. Both versions of this fix are crashing on debug mode though.
Comment 11 Rafael Brandao 2013-04-12 09:41:25 PDT
Created attachment 197862 [details]
Patch
Comment 12 Rafael Brandao 2013-04-12 09:42:54 PDT
(In reply to comment #11)
> Created an attachment (id=197862) [details]
> Patch

This is based on Lauro Moura's fix and also fixes the crash. We've kept the work queue running even after deleting it on the main thread. Other ports like Qt waits for thread termination, so this is also based on their approach.
Comment 13 Rafael Brandao 2013-04-12 09:44:50 PDT
(In reply to comment #8)
> 
> About the patch: no tests?

We're open to suggestions on how to test this. It's very port specific and it's about leaking stuff, like file descriptors. :(
Comment 14 Jesus Sanchez-Palencia 2013-04-12 09:53:03 PDT
Comment on attachment 197862 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=197862&action=review

> Source/WebKit2/ChangeLog:1
> +2013-04-12  Rafael Brandao  <rafael.lobo@openbossa.org>

You should perhaps mention that it is based on previous patches by FOO or BAR...
Comment 15 Rafael Brandao 2013-04-12 09:54:48 PDT
(In reply to comment #14)
> (From update of attachment 197862 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=197862&action=review
> 
> > Source/WebKit2/ChangeLog:1
> > +2013-04-12  Rafael Brandao  <rafael.lobo@openbossa.org>
> 
> You should perhaps mention that it is based on previous patches by FOO or BAR...

Indeed, I did not figure out how to that so I've put his name on the patch too. Other thing we could do is to split the crash fix in another bug since they seem unrelated.
Comment 16 Rafael Brandao 2013-04-12 10:37:58 PDT
(In reply to comment #11)
> Created an attachment (id=197862) [details]
> Patch

I share the concern of Anders about busy waiting on thread's completion might be bad but if we try to keep the WorkQueue alive by introducing a RefPtr on workQueueThread's mainloop, then the WorkQueue will never be released even if Connection goes away, and unfortunately we depend on WorkQueue being destroyed to ask the thread to stop.

I'm open to suggestions on alternatives and also propose that we could move this in and solve it separately. After all, busy waiting for thread seems much better than having unexpected behavior when executing layout tests. :(
Comment 17 Rafael Brandao 2013-04-12 19:33:21 PDT
Created attachment 197916 [details]
Patch
Comment 18 Rafael Brandao 2013-04-12 19:34:44 PDT
Anders, I took care of the busy wait by scheduling a release on the WorkQueue. It's also based on how WorkQueueMac works.
Comment 19 Rafael Brandao 2013-04-24 18:29:00 PDT
Created attachment 199575 [details]
Patch
Comment 20 Byungwoo Lee 2013-04-24 20:35:36 PDT
Comment on attachment 199575 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=199575&action=review

> Source/WebKit2/Platform/efl/WorkQueueEfl.cpp:133
>      if (!item)
>          return;
>  
> +    if (m_hasDispatchedRelease)
> +        return;
> +

Why not if (!item || m_hasDispatchedRelease) ?

> Source/WebKit2/Platform/efl/WorkQueueEfl.cpp:268
> +void WorkQueue::release()
> +{
> +    m_threadLoop = false;
> +}
> +
> +void WorkQueue::dispatchRelease()
> +{
> +    if (m_hasDispatchedRelease)
> +        return;
> +
> +    dispatch(WTF::bind(&WorkQueue::release, this));
> +    m_hasDispatchedRelease = true;
> +}

Is there any reason to maintain the two flags (m_threadLoop, m_hasDispatchedRelease)?
Comment 21 Rafael Brandao 2013-04-25 10:22:21 PDT
(In reply to comment #20)
> (From update of attachment 199575 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=199575&action=review
> 
> > Source/WebKit2/Platform/efl/WorkQueueEfl.cpp:133
> >      if (!item)
> >          return;
> >  
> > +    if (m_hasDispatchedRelease)
> > +        return;
> > +
> 
> Why not if (!item || m_hasDispatchedRelease) ?

Sounds good. :)

> 
> > Source/WebKit2/Platform/efl/WorkQueueEfl.cpp:268
> > +void WorkQueue::release()
> > +{
> > +    m_threadLoop = false;
> > +}
> > +
> > +void WorkQueue::dispatchRelease()
> > +{
> > +    if (m_hasDispatchedRelease)
> > +        return;
> > +
> > +    dispatch(WTF::bind(&WorkQueue::release, this));
> > +    m_hasDispatchedRelease = true;
> > +}
> 
> Is there any reason to maintain the two flags (m_threadLoop, m_hasDispatchedRelease)?

Nice catch! After you've asked, I came to think of it and really I could use only one. Thanks! I will upload a new version.
Comment 22 Rafael Brandao 2013-04-25 10:23:01 PDT
Created attachment 199682 [details]
Patch
Comment 23 Byungwoo Lee 2013-04-25 22:01:36 PDT
Comment on attachment 199682 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=199682&action=review

> Source/WebKit2/Platform/CoreIPC/unix/ConnectionUnix.cpp:153
> +    m_connectionQueue->dispatchRelease();

Is there any reason to call dispatchRelease() instead of release()?

> Source/WebKit2/Platform/efl/WorkQueueEfl.cpp:65
> -    sendMessageToThread(finishThreadMessage);
> +    dispatchRelease();

Why should this be changed?
Both functions seems to do the same operation.

> Source/WebKit2/Platform/efl/WorkQueueEfl.cpp:190
> +    // One more run to assure we'll clean up all remaining tasks.
> +    workQueue->performWork();
> +    ASSERT(workQueue->m_workItemQueue.isEmpty());
> +    workQueue->performTimerWork();
> +    ASSERT(workQueue->m_timerWorkItems.isEmpty());
> +

Why should this be added?
Comment 24 Rafael Brandao 2013-04-26 16:55:06 PDT
Thanks again for your review! :)

(In reply to comment #23)
> (From update of attachment 199682 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=199682&action=review
> 
> > Source/WebKit2/Platform/CoreIPC/unix/ConnectionUnix.cpp:153
> > +    m_connectionQueue->dispatchRelease();
> 
> Is there any reason to call dispatchRelease() instead of release()?

Hm, maybe because the proper release will not happen immediately. Also I believe it matches more how WorkQueueMac works.

> 
> > Source/WebKit2/Platform/efl/WorkQueueEfl.cpp:65
> > -    sendMessageToThread(finishThreadMessage);
> > +    dispatchRelease();
> 
> Why should this be changed?
> Both functions seems to do the same operation.

Great catch! I've changed the patch so a "dispatchRelease" will just send the finish message to the thread.

> 
> > Source/WebKit2/Platform/efl/WorkQueueEfl.cpp:190
> > +    // One more run to assure we'll clean up all remaining tasks.
> > +    workQueue->performWork();
> > +    ASSERT(workQueue->m_workItemQueue.isEmpty());
> > +    workQueue->performTimerWork();
> > +    ASSERT(workQueue->m_timerWorkItems.isEmpty());
> > +
> 
> Why should this be added?

This is the trickiest part so far, and I didn't find a clean solution yet; in fact, that code is not totally right.

At this point, we no longer have a loop to run the queue. If for any concurrency issues we get a successful dispatch after we've already turned off the loop, then we will never clean up the queues and leak whatever was being referenced there and the work queue (each item has a ref()).

Initially I've thought running both queues would take care of the problem: I would go for each item and deref() them. But the problem I can still think of a case where a random thread calls a dispatch before we turn the loop off and passes the check there, then on the main thread we ask to release and then the work queue's thread set its loop off and clean its queues. If we come back to the random thread, we would continue to run the dispatch and enqueue whatever was there, again producing a leak. This is theoretical, I can't think of how I can simulate this, but I believe it's a valid concern. I'll think more about this later.
Comment 25 Byungwoo Lee 2013-04-27 07:58:30 PDT
Comment on attachment 199682 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=199682&action=review

>>> Source/WebKit2/Platform/CoreIPC/unix/ConnectionUnix.cpp:153
>>> +    m_connectionQueue->dispatchRelease();
>> 
>> Is there any reason to call dispatchRelease() instead of release()?
> 
> Hm, maybe because the proper release will not happen immediately. Also I believe it matches more how WorkQueueMac works.

As my understanding, Connection::platformInvalidate() function works on the WorkQueue thread.
I still cannot clearly understand why the delay should be needed.
(Why should workqueue thread dispatch release() function to itself)

>>> Source/WebKit2/Platform/efl/WorkQueueEfl.cpp:190
>>> +
>> 
>> Why should this be added?
> 
> This is the trickiest part so far, and I didn't find a clean solution yet; in fact, that code is not totally right.
> 
> At this point, we no longer have a loop to run the queue. If for any concurrency issues we get a successful dispatch after we've already turned off the loop, then we will never clean up the queues and leak whatever was being referenced there and the work queue (each item has a ref()).
> 
> Initially I've thought running both queues would take care of the problem: I would go for each item and deref() them. But the problem I can still think of a case where a random thread calls a dispatch before we turn the loop off and passes the check there, then on the main thread we ask to release and then the work queue's thread set its loop off and clean its queues. If we come back to the random thread, we would continue to run the dispatch and enqueue whatever was there, again producing a leak. This is theoretical, I can't think of how I can simulate this, but I believe it's a valid concern. I'll think more about this later.

What is the difference between 'calling performWorks()' and 'deref() for each queue items' ?
Can the concurrency issue that you mentioned be solved with this solution?
And performWorks() and performTimerWork() will dispatch work items.
Why should the items be dispatched after the thread loop is ended?
Comment 26 Anders Carlsson 2013-04-27 10:57:37 PDT
I’m still not sure if this is the right approach. Keep in mind that WorkQueue needs to support doing something like

WorkQueue::create(“whatever”)->dispatch(myFunction); 

and after myFunction has successfully executed, the WorkQueue should be destroyed.
Comment 27 Byungwoo Lee 2013-04-28 12:38:34 PDT
With the comment from andersca, I found a problem of current efl workqueue ref()/deref() implementation. 
(deref() during performing work item can make dangling pointer) 
I filed a bug 115332 separately.
Comment 28 Byungwoo Lee 2013-04-29 03:08:30 PDT
(In reply to comment #27)
> With the comment from andersca, I found a problem of current efl workqueue ref()/deref() implementation. 
> (deref() during performing work item can make dangling pointer) 
> I filed a bug 115332 separately.

After filed the new bug, I read this patch and change-log again.
And I found that this patch seems to try to fix similar problem also.
(Initially, I cannot understand the purpose of this patch fully.
Sorry about this)

I didn't consider about the problem between Connection - WorkQueue (that you mentioned) in the new patch.
But for the WorkQueue ref()/deref() problem,
I think new approach (bug 115332) is more clear than this.

Is it ok to separate the two issues? How about your opinion?
Comment 29 Rafael Brandao 2013-05-15 17:18:11 PDT
Created attachment 201899 [details]
Patch
Comment 30 Rafael Brandao 2013-05-15 17:22:24 PDT
(In reply to comment #28)
> (In reply to comment #27)
> > With the comment from andersca, I found a problem of current efl workqueue ref()/deref() implementation. 
> > (deref() during performing work item can make dangling pointer) 
> > I filed a bug 115332 separately.
> 
> After filed the new bug, I read this patch and change-log again.
> And I found that this patch seems to try to fix similar problem also.
> (Initially, I cannot understand the purpose of this patch fully.
> Sorry about this)
> 
> I didn't consider about the problem between Connection - WorkQueue (that you mentioned) in the new patch.
> But for the WorkQueue ref()/deref() problem,
> I think new approach (bug 115332) is more clear than this.
> 
> Is it ok to separate the two issues? How about your opinion?

I think both are trying to solve the same issue: WorkQueue being deleted while its runLoop uses it to do stuff. I have decided to go back to a simpler approach (busy-wait on the parent thread) because it is small change, easy to understand, and we can try to remove it in the long term. But my attempts so far lead to even more complicated concurrency issue, so I believe we should do something similar to what Mac port does rather than creating a thread per WorkQueue.

Could you check if you can reproduce your issue with the latest patch?
Comment 31 Byungwoo Lee 2013-05-18 21:19:58 PDT
Comment on attachment 201899 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=201899&action=review

(In reply to comment #30)
> I think both are trying to solve the same issue: WorkQueue being deleted while its runLoop uses it to do stuff.

Yes, I agree with that, both share the same issue, and this is the initial bug for the issue, it'll be better to fix it with this bug :)

I'll try to keep refactoring issue back, and focus the issue of this bug.

> Source/WebKit2/ChangeLog:19
> +        A following issue was that Connection's WorkQueue's thread was still running after
> +        the WorkQueue was released and that thread could use its dangling pointer.
> +        This happens because each WorkQueue creates its own thread and pass its
> +        runLoop function to be executed in there. The solution to this dangling
> +        pointer was to wait on the parent thread for the end of the child thread
> +        before moving forward with WorkQueue destructor. Qt also does similar approach.

I think waiting thread terminate is not enough to handle the dangling WorkQueue problem.
As my understanding, WorkQueue::platformInvalidate() can be invoked in the workQueue Thread.
I think this approach cannot handle the case when the WorkQueue is invalidated on the workQueue thread.

> Source/WebKit2/Platform/efl/WorkQueueEfl.cpp:209
> +    m_socketEventHandler = Function<void()>();

This will make assertion.
Comment 32 Byungwoo Lee 2013-05-21 19:08:53 PDT
For the clear point about the WorkQueue dangling pointer issue,
I want to share my understanding.

As anders commented on the #26,
WorkQueue should support below use case.
   WorkQueue::create(“whatever”)->dispatch(myFunction);

This means,
The WorkQueue instance from WorkQueue::create("whatever") will create an event loop, and myFunction will be requested to dispatch on the event loop, and the WorkQueue instance will be de-referenced right after it.

The request will be processed on the event loop by an async way, so,

If myFunction was finished before the de-ref of WorkQueue instance, the event loop is in the idle state, so the de-ref will delete WorkQueue instance and the deletion should trigger event loop termination.

If myFunction is processing at the moment of de-ref of WorkQueue instance,
the WorkQueue should not be deleted because WorkQueue is currently processing.
To ensure this, event loop should keep the WorkQueue reference while processing the WorkQueue. (I think this is the reason that WorkQueue is ref-counted object.)
And after myFunction is finished (the WorkQueue processing is finished) in the event loop, the event loop will de-ref the WorkQueue. The de-ref will delete the WorkQueue instance and the deletion of the WorkQueue instance should trigger event loop termination.

In my opinion, to prevent the dangling pointer problem, it will be better to find the right way to handle WorkQueue reference count.
Comment 33 Rafael Brandao 2013-05-22 12:58:03 PDT
(In reply to comment #32)
> For the clear point about the WorkQueue dangling pointer issue,
> I want to share my understanding.
> 
> As anders commented on the #26,
> WorkQueue should support below use case.
>    WorkQueue::create(“whatever”)->dispatch(myFunction);
> 
> This means,
> The WorkQueue instance from WorkQueue::create("whatever") will create an event loop, and myFunction will be requested to dispatch on the event loop, and the WorkQueue instance will be de-referenced right after it.
> 
> The request will be processed on the event loop by an async way, so,
> 
> If myFunction was finished before the de-ref of WorkQueue instance, the event loop is in the idle state, so the de-ref will delete WorkQueue instance and the deletion should trigger event loop termination.
> 
> If myFunction is processing at the moment of de-ref of WorkQueue instance,
> the WorkQueue should not be deleted because WorkQueue is currently processing.
> To ensure this, event loop should keep the WorkQueue reference while processing the WorkQueue. (I think this is the reason that WorkQueue is ref-counted object.)
> And after myFunction is finished (the WorkQueue processing is finished) in the event loop, the event loop will de-ref the WorkQueue. The de-ref will delete the WorkQueue instance and the deletion of the WorkQueue instance should trigger event loop termination.
> 
> In my opinion, to prevent the dangling pointer problem, it will be better to find the right way to handle WorkQueue reference count.

You are right. I had the wrong assumption that destruction would always happen on mainThread instead of the workQueueThread. This detail makes the whole thing even more complicated.

The problem about keeping a reference on the WorkQueueThread is that if we are stuck waiting for sync I/O, we will just leak because WorkQueue::platformInvalidate will never be called.

I was trying to workaround this issue and got a WIP patch (https://gist.github.com/rafaelbrandao/81929a88aec886d5457e), but running on debug mode can still make me get into WTF::Lock asserts while executing WorkQueue::performWork (WorkQueue was already deleted). And looking at your initial approach in the other bug, they are very similar. Still I don't think this is the right way to fix it, as Christophe Dumez pointed out it would just make the problem less frequent.

Maybe we should go forward with a design similar to what Mac port does, but I'm not sure how we can get it without the usage of anonymous functions (check WorkQueueMac). Those functions would be directed to something else that will manage which thread will run each function and also manage the amount of threads (it could depend on the amount of cores).

Feel free to move forward with fixing this bug, I'm stuck with other stuff and I understand this should be fixed as soon as possible.
Comment 34 Rafael Brandao 2013-05-22 17:31:14 PDT
Thanks a lot for your review. :)

(In reply to comment #31)
> (From update of attachment 201899 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=201899&action=review
> 
> (In reply to comment #30)
> > I think both are trying to solve the same issue: WorkQueue being deleted while its runLoop uses it to do stuff.
> 
> Yes, I agree with that, both share the same issue, and this is the initial bug for the issue, it'll be better to fix it with this bug :)
> 
> I'll try to keep refactoring issue back, and focus the issue of this bug.

I agree that we should try to leave refactoring for later, but now I'm unsure of how we should fix this. :-(

> 
> > Source/WebKit2/ChangeLog:19
> > +        A following issue was that Connection's WorkQueue's thread was still running after
> > +        the WorkQueue was released and that thread could use its dangling pointer.
> > +        This happens because each WorkQueue creates its own thread and pass its
> > +        runLoop function to be executed in there. The solution to this dangling
> > +        pointer was to wait on the parent thread for the end of the child thread
> > +        before moving forward with WorkQueue destructor. Qt also does similar approach.
> 
> I think waiting thread terminate is not enough to handle the dangling WorkQueue problem.
> As my understanding, WorkQueue::platformInvalidate() can be invoked in the workQueue Thread.
> I think this approach cannot handle the case when the WorkQueue is invalidated on the workQueue thread.

Indeed. When I compiled in debug mode I could finally see it crashing. Thanks for the notice!

> 
> > Source/WebKit2/Platform/efl/WorkQueueEfl.cpp:209
> > +    m_socketEventHandler = Function<void()>();
> 
> This will make assertion.

Why it would assert? This function is called on WorkQueueThread when Connection is about to close. This handler was holding a reference to the WorkQueue on WorkQueueThread so without this change, WorkQueue would never be destroyed.
Comment 35 Byungwoo Lee 2013-05-22 18:22:54 PDT
Comment on attachment 201899 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=201899&action=review

>>> Source/WebKit2/Platform/efl/WorkQueueEfl.cpp:209
>>> +    m_socketEventHandler = Function<void()>();
>> 
>> This will make assertion.
> 
> Why it would assert? This function is called on WorkQueueThread when Connection is about to close. This handler was holding a reference to the WorkQueue on WorkQueueThread so without this change, WorkQueue would never be destroyed.

I thought that 'unregisterSocketEventHandler() can be invoked on a thread which is not the workqueue thread.
If so, m_socketEventHandler() will make assersion if m_socketEventHandler is Function<void()>() because there is any locking for m_socketEventHandler.

But WorkQueue::unregisterSocketEventHandler() will only work on the workqueue thread because Connection::platformInvalidate() is always working on the workqueue thread.
So, setting m_socketDescriptor as invalidSocketDescriptor will be enough to prevent calling m_socketEventHandler() because the value is checked before calling m_socketEventHandler().
I think this is redundant.
Comment 36 Byungwoo Lee 2013-05-22 18:35:30 PDT
(In reply to comment #35)
> (From update of attachment 201899 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=201899&action=review
> 
> >>> Source/WebKit2/Platform/efl/WorkQueueEfl.cpp:209
> >>> +    m_socketEventHandler = Function<void()>();
> >> 
> >> This will make assertion.
> > 
> > Why it would assert? This function is called on WorkQueueThread when Connection is about to close. This handler was holding a reference to the WorkQueue on WorkQueueThread so without this change, WorkQueue would never be destroyed.
> 
> I thought that 'unregisterSocketEventHandler() can be invoked on a thread which is not the workqueue thread.
> If so, m_socketEventHandler() will make assersion if m_socketEventHandler is Function<void()>() because there is any locking for m_socketEventHandler.
> 
> But WorkQueue::unregisterSocketEventHandler() will only work on the workqueue thread because Connection::platformInvalidate() is always working on the workqueue thread.
> So, setting m_socketDescriptor as invalidSocketDescriptor will be enough to prevent calling m_socketEventHandler() because the value is checked before calling m_socketEventHandler().
> I think this is redundant.

And I cannot understand that 'This handler was holding a reference to the WorkQueue on WorkQueueThread' I cannot find any workqueue referencing about m_socketEventHandler.
Comment 37 Rafael Brandao 2013-05-22 22:23:54 PDT
Comment on attachment 201899 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=201899&action=review

>>>>> Source/WebKit2/Platform/efl/WorkQueueEfl.cpp:209
>>>>> +    m_socketEventHandler = Function<void()>();
>>>> 
>>>> This will make assertion.
>>> 
>>> Why it would assert? This function is called on WorkQueueThread when Connection is about to close. This handler was holding a reference to the WorkQueue on WorkQueueThread so without this change, WorkQueue would never be destroyed.
>> 
>> I thought that 'unregisterSocketEventHandler() can be invoked on a thread which is not the workqueue thread.
>> If so, m_socketEventHandler() will make assersion if m_socketEventHandler is Function<void()>() because there is any locking for m_socketEventHandler.
>> 
>> But WorkQueue::unregisterSocketEventHandler() will only work on the workqueue thread because Connection::platformInvalidate() is always working on the workqueue thread.
>> So, setting m_socketDescriptor as invalidSocketDescriptor will be enough to prevent calling m_socketEventHandler() because the value is checked before calling m_socketEventHandler().
>> I think this is redundant.
> 
> And I cannot understand that 'This handler was holding a reference to the WorkQueue on WorkQueueThread' I cannot find any workqueue referencing about m_socketEventHandler.

Sorry, it's not WorkQueue's reference being held on WorkQueueThread, but Connection's reference. Check on ConnectionUnix where registerSocketEventHandler is called. It does a WTF::bind of a Connection's function: this increases ref count of Connection until that function is cleared. If we don't clear m_socketEventHandler on unregisterSocketEventHandler, then a reference to Connection will keep him alive because of its own WorkQueue and then both Connection and WorkQueue will leak (thanks to Lauro for fixing it!).
Comment 38 Byungwoo Lee 2013-05-23 19:40:00 PDT
Comment on attachment 201899 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=201899&action=review

>>>>>> Source/WebKit2/Platform/efl/WorkQueueEfl.cpp:209
>>>>>> +    m_socketEventHandler = Function<void()>();
>>>>> 
>>>>> This will make assertion.
>>>> 
>>>> Why it would assert? This function is called on WorkQueueThread when Connection is about to close. This handler was holding a reference to the WorkQueue on WorkQueueThread so without this change, WorkQueue would never be destroyed.
>>> 
>>> I thought that 'unregisterSocketEventHandler() can be invoked on a thread which is not the workqueue thread.
>>> If so, m_socketEventHandler() will make assersion if m_socketEventHandler is Function<void()>() because there is any locking for m_socketEventHandler.
>>> 
>>> But WorkQueue::unregisterSocketEventHandler() will only work on the workqueue thread because Connection::platformInvalidate() is always working on the workqueue thread.
>>> So, setting m_socketDescriptor as invalidSocketDescriptor will be enough to prevent calling m_socketEventHandler() because the value is checked before calling m_socketEventHandler().
>>> I think this is redundant.
>> 
>> And I cannot understand that 'This handler was holding a reference to the WorkQueue on WorkQueueThread' I cannot find any workqueue referencing about m_socketEventHandler.
> 
> Sorry, it's not WorkQueue's reference being held on WorkQueueThread, but Connection's reference. Check on ConnectionUnix where registerSocketEventHandler is called. It does a WTF::bind of a Connection's function: this increases ref count of Connection until that function is cleared. If we don't clear m_socketEventHandler on unregisterSocketEventHandler, then a reference to Connection will keep him alive because of its own WorkQueue and then both Connection and WorkQueue will leak (thanks to Lauro for fixing it!).

Ok, Now I can understand about the reference chain between WorkQueue and Connection with m_socketEventHandler (Thanks!), and clearing m_socketEventHandler looks reasonable.

How about separating the workqueue problem from this patch?
Clearing m_socketEventHandler looks right approach to prevent the leaking of the Connection, but dangling work queue problem looks a different issue that can be handled on another bug.
Comment 39 Rafael Brandao 2013-05-24 11:03:02 PDT
(In reply to comment #38)
> (From update of attachment 201899 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=201899&action=review
> How about separating the workqueue problem from this patch?
> Clearing m_socketEventHandler looks right approach to prevent the leaking of the Connection, but dangling work queue problem looks a different issue that can be handled on another bug.

Sigh. I totally agree with this; in fact, I had also proposed that earlier. But maybe we should be careful about landing Connection fix without WorkQueue's destruction fix since one exposes the other. It will probably make our testing suit a mess (random crashes).
Comment 40 Byungwoo Lee 2013-05-24 18:58:13 PDT
(In reply to comment #39)
> (In reply to comment #38)
> > (From update of attachment 201899 [details] [details])
> > View in context: https://bugs.webkit.org/attachment.cgi?id=201899&action=review
> > How about separating the workqueue problem from this patch?
> > Clearing m_socketEventHandler looks right approach to prevent the leaking of the Connection, but dangling work queue problem looks a different issue that can be handled on another bug.
> 
> Sigh. I totally agree with this; in fact, I had also proposed that earlier. But maybe we should be careful about landing Connection fix without WorkQueue's destruction fix since one exposes the other. It will probably make our testing suit a mess (random crashes).

Then, How about creating a new bug for the dangling workqueue and setting dependency with this?
Comment 41 Nick Diego Yamane (diegoyam) 2013-06-06 16:47:39 PDT
I've just made some tests using the latest version of patch from https://bugs.webkit.org/show_bug.cgi?id=115332 + the WorkQueue's socketEventHandler cleanup at 'unregisterSocketEventHandler' function (proposed previously here).
I used this API test case (https://github.com/WebKitNix/webkitnix/blob/master/Tools/TestWebKitAPI/Tests/nix/WebViewWebProcessCrashed.cpp) from WebKitNix port, which forces WebProcess to crash/reload sequentially multiple times. Bellow some details about the results:

New implementation based on 'DispatchQueue' seems to be much more robust and easy to maintain/extend than the previous one, however at some point (usually after 400 crash/respawns) the test crashes with a ASSERTION fail at WTF::Mutex's destructor. More details here, gdb session log: http://paste.debian.net/8904/).

From that debug session I could see that a call to pthread_mutex_destroy() at Mutex's destructor was returning error code 16, which most probably means it was trying to destroy a locked mutex[1]. m_writeToPipeDescriptorLock was the mutex causing the problem. So, I propose (after the mentioned patch has been landed) to protect the DispatchQueue destruction using a MutexLocker on that mutex to prevent this concurrency issue between ~DispatchQueue and DispatchQueue::wakeup().

[1] http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_mutex_init.html
Comment 42 8179.jeong@gmail.com 2013-09-27 18:54:48 PDT
*** Bug 122044 has been marked as a duplicate of this bug. ***
Comment 43 Michael Catanzaro 2017-03-11 10:31:36 PST
Closing this bug because the EFL port has been removed from trunk.

If you feel this bug applies to a different upstream WebKit port and was closed in error, please either update the title and reopen the bug, or leave a comment to request this.