Bug 144824 - [GTK] WorkQueue objects are not released
Summary: [GTK] WorkQueue objects are not released
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC Linux
: P2 Normal
Assignee: Michael Catanzaro
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-05-08 18:50 PDT by Michael Catanzaro
Modified: 2015-05-11 03:23 PDT (History)
8 users (show)

See Also:


Attachments
Patch (3.39 KB, patch)
2015-05-09 08:21 PDT, Michael Catanzaro
no flags Details | Formatted Diff | Diff
Patch (3.49 KB, patch)
2015-05-09 10:35 PDT, Michael Catanzaro
no flags Details | Formatted Diff | Diff
Patch (3.14 KB, patch)
2015-05-10 04:09 PDT, Carlos Garcia Campos
no flags Details | Formatted Diff | Diff
Updated patch (3.71 KB, patch)
2015-05-11 02:39 PDT, Carlos Garcia Campos
zan: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Catanzaro 2015-05-08 18:50:54 PDT
**FAIL** WTF_WorkQueue.Simple

../../Tools/TestWebKitAPI/Tests/WTF/WorkQueue.cpp:57

Value of: initialRefCount

  Actual: 2

Expected: 1

This is not a bug: we need the second ref to keep the WorkQueue alive in the lambda function where the main loop runs in another thread, in WorkQueue::platformInitialize:

RefPtr<WorkQueue> protector(this);
m_workQueueThread = createThread(threadName, [protector] {
    g_main_context_push_thread_default(protector->m_eventContext.get());
    g_main_loop_run(protector->m_eventLoop.get());
    g_main_context_pop_thread_default(protector->m_eventContext.get());
});

Since we need to keep the WorkQueue alive in another thread, I think the only option is to change the test, not the implementation. CC Brent since he wrote this test. We could guard the reference count checks with #if !PLATFORM(GTK), or just remove the checks completely.
Comment 1 Michael Catanzaro 2015-05-08 18:59:57 PDT
So all of the reference count checks in this test check the initial reference count, except for this one:

    EXPECT_GT(queue->refCount(), 1);

    m_testCompleted.wait(m_lock);

But that one would fail (on any platform) if each task completes prior to the assertion being run.
Comment 2 Michael Catanzaro 2015-05-09 08:21:44 PDT
Created attachment 252775 [details]
Patch
Comment 3 Brent Fulgham 2015-05-09 09:51:35 PDT
Comment on attachment 252775 [details]
Patch

It doesn't look like we have the right behavior on iOS/mac with this change. :(
Comment 4 Brent Fulgham 2015-05-09 09:53:28 PDT
Comment on attachment 252775 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=252775&action=review

> Tools/TestWebKitAPI/Tests/WTF/WorkQueue.cpp:71
> +        EXPECT_GT(queue->refCount(), expectedInitialRefcount);

I think I originally tried to have something like this, but ran into problems with race conditions.
Comment 5 Brent Fulgham 2015-05-09 09:55:46 PDT
Comment on attachment 252775 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=252775&action=review

>> Tools/TestWebKitAPI/Tests/WTF/WorkQueue.cpp:71
>> +        EXPECT_GT(queue->refCount(), expectedInitialRefcount);
> 
> I think I originally tried to have something like this, but ran into problems with race conditions.

Never mind -- this might be ok. It's just clang is unhappy with the difference in sign. I think refCount has a different type (maybe size_t?)
Comment 6 Michael Catanzaro 2015-05-09 10:34:29 PDT
The refcount is an unsigned int, but it was being stored in a signed int local variable. I'll switch the local variable declaration to use auto instead. Hopefully that will placate clang.
Comment 7 Michael Catanzaro 2015-05-09 10:35:52 PDT
Created attachment 252780 [details]
Patch
Comment 8 Zan Dobersek 2015-05-09 10:56:09 PDT
So, in WorkQueue::platformInitialize() we create a thread and pass in a lambda that captured a RefPtr<WorkQueue>, essentially ensuring it doesn't get destroyed during that thread's lifetime.

In createThread() the lambda (now wrapped in std::function<>) is stored on the heap, in the NewThreadContext object. That is again accessed on the newly-created thread, in threadEntryPoint(), moving the std::function<> off the heap, deleting the NewThreadContext, and passing the thread control to said std::function<> which is now spinning the WorkQueue's m_eventLoop.

The kicker is that because a reference to the WorkQueue is held on the WorkQueue-owned thread, the WorkQueue and that thread don't get destroyed when the user drops his last reference to it, as he'd expect. We also don't interrupt the main loop anywhere else but the WorkQueue destructor (well, ::platformInvalidate()).

I haven't confirmed this analysis in real world, but if it's correct, we're not destroying any WorkQueue and the underlying thread that we create.
Comment 9 Martin Robinson 2015-05-09 13:08:52 PDT
(In reply to comment #8)

> I haven't confirmed this analysis in real world, but if it's correct, we're
> not destroying any WorkQueue and the underlying thread that we create.

So this sounds like a memory leak, no?
Comment 10 Michael Catanzaro 2015-05-09 15:47:54 PDT
As a sanity check I added this to WorkQueue.cpp:

#ifndef NDEBUG
#include <wtf/RefCountedLeakCounter.h>
#endif

DEFINE_DEBUG_ONLY_GLOBAL(WTF::RefCountedLeakCounter, workQueueCounter, ("WorkQueue"));

Then browsed through a few pages on a debug build. We seem to be leaking everything except WorkQueues.
Comment 11 Michael Catanzaro 2015-05-09 15:58:27 PDT
(In reply to comment #10)
> As a sanity check I added this to WorkQueue.cpp:

It wasn't a good sanity check because I forgot to increment and decrement the counter in the constructor/destructor. We leak 6 WorkQueues per WebView.
Comment 12 Michael Catanzaro 2015-05-09 18:01:29 PDT
A solution that seems seductively simple:

    GRefPtr<GMainContext> eventContextProtector(m_eventContext.get());
    GRefPtr<GMainLoop> mainLoopProtector(m_eventLoop.get());
    m_workQueueThread = createThread(threadName, [eventContextProtector, mainLoopProtector] {
        g_main_context_push_thread_default(eventContextProtector.get());
        g_main_loop_run(mainLoopProtector.get());
        g_main_context_pop_thread_default(eventContextProtector.get());
    });

Looks good to me...? This fixes the test, but only brings us down to 5 leaks per WebView, so we're still losing a bunch of threads.

Aside: Is there a reason to pop the thread default main context immediately before the thread ends, Carlos?

Aside: Thanks for the test, Brent.
Comment 13 Carlos Garcia Campos 2015-05-10 02:53:59 PDT
(In reply to comment #8)
> So, in WorkQueue::platformInitialize() we create a thread and pass in a
> lambda that captured a RefPtr<WorkQueue>, essentially ensuring it doesn't
> get destroyed during that thread's lifetime.

Yes, but that was not the original intention, see bug #140998. We wanted to make sure that the thread was detached when the work queue is destroyed before the thread body has started. Since the WorkQueue was changed to be refcounted, every source takes a reference of the workQueue, so we don't really need to keep an extra reference for the whole like of the worker thread. I think the right solution could be to go back to the first version of the patch in bug #140998.
Comment 14 Carlos Garcia Campos 2015-05-10 03:34:51 PDT
(In reply to comment #12)
> A solution that seems seductively simple:
> 
>     GRefPtr<GMainContext> eventContextProtector(m_eventContext.get());
>     GRefPtr<GMainLoop> mainLoopProtector(m_eventLoop.get());
>     m_workQueueThread = createThread(threadName, [eventContextProtector,
> mainLoopProtector] {
>         g_main_context_push_thread_default(eventContextProtector.get());
>         g_main_loop_run(mainLoopProtector.get());
>         g_main_context_pop_thread_default(eventContextProtector.get());
>     });
> 
> Looks good to me...? This fixes the test, but only brings us down to 5 leaks
> per WebView, so we're still losing a bunch of threads.
> 
> Aside: Is there a reason to pop the thread default main context immediately
> before the thread ends, Carlos?

Actually, we don't even need to pop the context, since we push it right after creating the thread and the thread-specific context queue will be freed when the thread exits.

> Aside: Thanks for the test, Brent.
Comment 15 Carlos Garcia Campos 2015-05-10 03:51:40 PDT
(In reply to comment #14)
> (In reply to comment #12)
> > A solution that seems seductively simple:
> > 
> >     GRefPtr<GMainContext> eventContextProtector(m_eventContext.get());
> >     GRefPtr<GMainLoop> mainLoopProtector(m_eventLoop.get());
> >     m_workQueueThread = createThread(threadName, [eventContextProtector,
> > mainLoopProtector] {
> >         g_main_context_push_thread_default(eventContextProtector.get());
> >         g_main_loop_run(mainLoopProtector.get());
> >         g_main_context_pop_thread_default(eventContextProtector.get());
> >     });
> > 
> > Looks good to me...? This fixes the test, but only brings us down to 5 leaks
> > per WebView, so we're still losing a bunch of threads.
> > 
> > Aside: Is there a reason to pop the thread default main context immediately
> > before the thread ends, Carlos?
> 
> Actually, we don't even need to pop the context, since we push it right
> after creating the thread and the thread-specific context queue will be
> freed when the thread exits.

We could also push/pop for every task that is run in the queue, but i think it's simpler to push right after creating the queue and never pop.

> > Aside: Thanks for the test, Brent.
Comment 16 Carlos Garcia Campos 2015-05-10 04:09:01 PDT
Created attachment 252812 [details]
Patch

This should fix the test and the work queue leak.
Comment 17 Michael Catanzaro 2015-05-10 08:37:33 PDT
Carlos, I would change the first line of your commit message: "[GTK] WorkQueue runs forever" would be more descriptive.

I tried to write a test to ensure the thread actually quits, but it's hard: detaching the thread in the destructor makes it impossible to join on it to ensure it completes, and counting threads in /proc is racy without a join, so we'd have to add new API to WorkQueue to change the behavior of the destructor just for such a test... and then we're not really testing the same functionality anymore. So I think we should be content with the existing refcount test.
Comment 18 Carlos Garcia Campos 2015-05-11 02:39:03 PDT
Created attachment 252853 [details]
Updated patch

Zan pointed out that the thread could keep running forever if the work queue is released before the thread has run the main loop. This updated patch fixes that case by scheduling a main loop quit when the work queue is released before the thread has started. As Michael said, this is not easy to test, I ran some custom tests with printfs to verify the behaviour.
Comment 19 Zan Dobersek 2015-05-11 02:50:36 PDT
Comment on attachment 252853 [details]
Updated patch

View in context: https://bugs.webkit.org/attachment.cgi?id=252853&action=review

> Source/WTF/wtf/gtk/WorkQueueGtk.cpp:72
>      if (m_eventLoop) {

This check shouldn't be necessary, we ASSERT the m_eventLoop existence in the constructor and don't destroy it anywhere.
Comment 20 Carlos Garcia Campos 2015-05-11 02:53:58 PDT
(In reply to comment #19)
> Comment on attachment 252853 [details]
> Updated patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=252853&action=review
> 
> > Source/WTF/wtf/gtk/WorkQueueGtk.cpp:72
> >      if (m_eventLoop) {
> 
> This check shouldn't be necessary, we ASSERT the m_eventLoop existence in
> the constructor and don't destroy it anywhere.

Yes, this is because in the past the platformInvalidate was not called by the destructor, but by invalidate method called by the user.
Comment 21 Carlos Garcia Campos 2015-05-11 03:12:01 PDT
I'm going to leave the checks in the end, to ensure it keeps working even if the implementation changes again and platformInvalidate is moved somewhere else.
Comment 22 Carlos Garcia Campos 2015-05-11 03:23:12 PDT
Committed r184072: <http://trac.webkit.org/changeset/184072>