144824 – [GTK] WorkQueue objects are not released

RESOLVED FIXED 144824

[GTK] WorkQueue objects are not released

https://bugs.webkit.org/show_bug.cgi?id=144824

Summary [GTK] WorkQueue objects are not released

Michael Catanzaro

Reported 2015-05-08 18:50:54 PDT

**FAIL** WTF_WorkQueue.Simple ../../Tools/TestWebKitAPI/Tests/WTF/WorkQueue.cpp:57 Value of: initialRefCount Actual: 2 Expected: 1 This is not a bug: we need the second ref to keep the WorkQueue alive in the lambda function where the main loop runs in another thread, in WorkQueue::platformInitialize: RefPtr<WorkQueue> protector(this); m_workQueueThread = createThread(threadName, [protector] { g_main_context_push_thread_default(protector->m_eventContext.get()); g_main_loop_run(protector->m_eventLoop.get()); g_main_context_pop_thread_default(protector->m_eventContext.get()); }); Since we need to keep the WorkQueue alive in another thread, I think the only option is to change the test, not the implementation. CC Brent since he wrote this test. We could guard the reference count checks with #if !PLATFORM(GTK), or just remove the checks completely.

Attachments
Patch (3.39 KB, patch) 2015-05-09 08:21 PDT, Michael Catanzaro	no flags	Details Formatted Diff Diff
Patch (3.49 KB, patch) 2015-05-09 10:35 PDT, Michael Catanzaro	no flags	Details Formatted Diff Diff
Patch (3.14 KB, patch) 2015-05-10 04:09 PDT, Carlos Garcia Campos	no flags	Details Formatted Diff Diff
Updated patch (3.71 KB, patch) 2015-05-11 02:39 PDT, Carlos Garcia Campos	zan: review+	Details Formatted Diff Diff
Show Obsolete (3) View All Add attachment proposed patch, testcase, etc.

Michael Catanzaro

Comment 1 2015-05-08 18:59:57 PDT

So all of the reference count checks in this test check the initial reference count, except for this one: EXPECT_GT(queue->refCount(), 1); m_testCompleted.wait(m_lock); But that one would fail (on any platform) if each task completes prior to the assertion being run.

Michael Catanzaro

Comment 2 2015-05-09 08:21:44 PDT

Created attachment 252775 [details] Patch

Brent Fulgham

Comment 3 2015-05-09 09:51:35 PDT

Comment on attachment 252775 [details] Patch It doesn't look like we have the right behavior on iOS/mac with this change. :(

Brent Fulgham

Comment 4 2015-05-09 09:53:28 PDT

Comment on attachment 252775 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=252775&action=review > Tools/TestWebKitAPI/Tests/WTF/WorkQueue.cpp:71 > + EXPECT_GT(queue->refCount(), expectedInitialRefcount); I think I originally tried to have something like this, but ran into problems with race conditions.

Brent Fulgham

Comment 5 2015-05-09 09:55:46 PDT

Comment on attachment 252775 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=252775&action=review >> Tools/TestWebKitAPI/Tests/WTF/WorkQueue.cpp:71 >> + EXPECT_GT(queue->refCount(), expectedInitialRefcount); > > I think I originally tried to have something like this, but ran into problems with race conditions. Never mind -- this might be ok. It's just clang is unhappy with the difference in sign. I think refCount has a different type (maybe size_t?)

Michael Catanzaro

Comment 6 2015-05-09 10:34:29 PDT

The refcount is an unsigned int, but it was being stored in a signed int local variable. I'll switch the local variable declaration to use auto instead. Hopefully that will placate clang.

Michael Catanzaro

Comment 7 2015-05-09 10:35:52 PDT

Created attachment 252780 [details] Patch

Zan Dobersek

Comment 8 2015-05-09 10:56:09 PDT

So, in WorkQueue::platformInitialize() we create a thread and pass in a lambda that captured a RefPtr<WorkQueue>, essentially ensuring it doesn't get destroyed during that thread's lifetime. In createThread() the lambda (now wrapped in std::function<>) is stored on the heap, in the NewThreadContext object. That is again accessed on the newly-created thread, in threadEntryPoint(), moving the std::function<> off the heap, deleting the NewThreadContext, and passing the thread control to said std::function<> which is now spinning the WorkQueue's m_eventLoop. The kicker is that because a reference to the WorkQueue is held on the WorkQueue-owned thread, the WorkQueue and that thread don't get destroyed when the user drops his last reference to it, as he'd expect. We also don't interrupt the main loop anywhere else but the WorkQueue destructor (well, ::platformInvalidate()). I haven't confirmed this analysis in real world, but if it's correct, we're not destroying any WorkQueue and the underlying thread that we create.

Martin Robinson

Comment 9 2015-05-09 13:08:52 PDT

(In reply to comment #8) > I haven't confirmed this analysis in real world, but if it's correct, we're > not destroying any WorkQueue and the underlying thread that we create. So this sounds like a memory leak, no?

Michael Catanzaro

Comment 10 2015-05-09 15:47:54 PDT

As a sanity check I added this to WorkQueue.cpp: #ifndef NDEBUG #include <wtf/RefCountedLeakCounter.h> #endif DEFINE_DEBUG_ONLY_GLOBAL(WTF::RefCountedLeakCounter, workQueueCounter, ("WorkQueue")); Then browsed through a few pages on a debug build. We seem to be leaking everything except WorkQueues.

Michael Catanzaro

Comment 11 2015-05-09 15:58:27 PDT

(In reply to comment #10) > As a sanity check I added this to WorkQueue.cpp: It wasn't a good sanity check because I forgot to increment and decrement the counter in the constructor/destructor. We leak 6 WorkQueues per WebView.

Michael Catanzaro

Comment 12 2015-05-09 18:01:29 PDT

A solution that seems seductively simple: GRefPtr<GMainContext> eventContextProtector(m_eventContext.get()); GRefPtr<GMainLoop> mainLoopProtector(m_eventLoop.get()); m_workQueueThread = createThread(threadName, [eventContextProtector, mainLoopProtector] { g_main_context_push_thread_default(eventContextProtector.get()); g_main_loop_run(mainLoopProtector.get()); g_main_context_pop_thread_default(eventContextProtector.get()); }); Looks good to me...? This fixes the test, but only brings us down to 5 leaks per WebView, so we're still losing a bunch of threads. Aside: Is there a reason to pop the thread default main context immediately before the thread ends, Carlos? Aside: Thanks for the test, Brent.

Carlos Garcia Campos

Comment 13 2015-05-10 02:53:59 PDT

(In reply to comment #8) > So, in WorkQueue::platformInitialize() we create a thread and pass in a > lambda that captured a RefPtr<WorkQueue>, essentially ensuring it doesn't > get destroyed during that thread's lifetime. Yes, but that was not the original intention, see bug #140998. We wanted to make sure that the thread was detached when the work queue is destroyed before the thread body has started. Since the WorkQueue was changed to be refcounted, every source takes a reference of the workQueue, so we don't really need to keep an extra reference for the whole like of the worker thread. I think the right solution could be to go back to the first version of the patch in bug #140998.

Carlos Garcia Campos

Comment 14 2015-05-10 03:34:51 PDT

(In reply to comment #12) > A solution that seems seductively simple: > > GRefPtr<GMainContext> eventContextProtector(m_eventContext.get()); > GRefPtr<GMainLoop> mainLoopProtector(m_eventLoop.get()); > m_workQueueThread = createThread(threadName, [eventContextProtector, > mainLoopProtector] { > g_main_context_push_thread_default(eventContextProtector.get()); > g_main_loop_run(mainLoopProtector.get()); > g_main_context_pop_thread_default(eventContextProtector.get()); > }); > > Looks good to me...? This fixes the test, but only brings us down to 5 leaks > per WebView, so we're still losing a bunch of threads. > > Aside: Is there a reason to pop the thread default main context immediately > before the thread ends, Carlos? Actually, we don't even need to pop the context, since we push it right after creating the thread and the thread-specific context queue will be freed when the thread exits. > Aside: Thanks for the test, Brent.

Carlos Garcia Campos

Comment 15 2015-05-10 03:51:40 PDT

(In reply to comment #14) > (In reply to comment #12) > > A solution that seems seductively simple: > > > > GRefPtr<GMainContext> eventContextProtector(m_eventContext.get()); > > GRefPtr<GMainLoop> mainLoopProtector(m_eventLoop.get()); > > m_workQueueThread = createThread(threadName, [eventContextProtector, > > mainLoopProtector] { > > g_main_context_push_thread_default(eventContextProtector.get()); > > g_main_loop_run(mainLoopProtector.get()); > > g_main_context_pop_thread_default(eventContextProtector.get()); > > }); > > > > Looks good to me...? This fixes the test, but only brings us down to 5 leaks > > per WebView, so we're still losing a bunch of threads. > > > > Aside: Is there a reason to pop the thread default main context immediately > > before the thread ends, Carlos? > > Actually, we don't even need to pop the context, since we push it right > after creating the thread and the thread-specific context queue will be > freed when the thread exits. We could also push/pop for every task that is run in the queue, but i think it's simpler to push right after creating the queue and never pop. > > Aside: Thanks for the test, Brent.

Carlos Garcia Campos

Comment 16 2015-05-10 04:09:01 PDT

Created attachment 252812 [details] Patch This should fix the test and the work queue leak.

Michael Catanzaro

Comment 17 2015-05-10 08:37:33 PDT

Carlos, I would change the first line of your commit message: "[GTK] WorkQueue runs forever" would be more descriptive. I tried to write a test to ensure the thread actually quits, but it's hard: detaching the thread in the destructor makes it impossible to join on it to ensure it completes, and counting threads in /proc is racy without a join, so we'd have to add new API to WorkQueue to change the behavior of the destructor just for such a test... and then we're not really testing the same functionality anymore. So I think we should be content with the existing refcount test.

Carlos Garcia Campos

Comment 18 2015-05-11 02:39:03 PDT

Created attachment 252853 [details] Updated patch Zan pointed out that the thread could keep running forever if the work queue is released before the thread has run the main loop. This updated patch fixes that case by scheduling a main loop quit when the work queue is released before the thread has started. As Michael said, this is not easy to test, I ran some custom tests with printfs to verify the behaviour.

Zan Dobersek

Comment 19 2015-05-11 02:50:36 PDT

Comment on attachment 252853 [details] Updated patch View in context: https://bugs.webkit.org/attachment.cgi?id=252853&action=review > Source/WTF/wtf/gtk/WorkQueueGtk.cpp:72 > if (m_eventLoop) { This check shouldn't be necessary, we ASSERT the m_eventLoop existence in the constructor and don't destroy it anywhere.

Carlos Garcia Campos

Comment 20 2015-05-11 02:53:58 PDT

(In reply to comment #19) > Comment on attachment 252853 [details] > Updated patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=252853&action=review > > > Source/WTF/wtf/gtk/WorkQueueGtk.cpp:72 > > if (m_eventLoop) { > > This check shouldn't be necessary, we ASSERT the m_eventLoop existence in > the constructor and don't destroy it anywhere. Yes, this is because in the past the platformInvalidate was not called by the destructor, but by invalidate method called by the user.

Carlos Garcia Campos

Comment 21 2015-05-11 03:12:01 PDT

I'm going to leave the checks in the end, to ensure it keeps working even if the implementation changes again and platformInvalidate is moved somewhere else.

Carlos Garcia Campos

Comment 22 2015-05-11 03:23:12 PDT

Committed r184072: <http://trac.webkit.org/changeset/184072>

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware PC

OS Linux

Product WebKit

Component Tools / Tests

Assignee

Michael Catanzaro

Reported

2015-05-08 18:50 PDT

Modified

2015-05-11 03:23 PDT History

CC List

8 users Show

URL

Keywords

Depends on

Blocks