**FAIL** WTF_WorkQueue.Simple ../../Tools/TestWebKitAPI/Tests/WTF/WorkQueue.cpp:57 Value of: initialRefCount Actual: 2 Expected: 1 This is not a bug: we need the second ref to keep the WorkQueue alive in the lambda function where the main loop runs in another thread, in WorkQueue::platformInitialize: RefPtr<WorkQueue> protector(this); m_workQueueThread = createThread(threadName, [protector] { g_main_context_push_thread_default(protector->m_eventContext.get()); g_main_loop_run(protector->m_eventLoop.get()); g_main_context_pop_thread_default(protector->m_eventContext.get()); }); Since we need to keep the WorkQueue alive in another thread, I think the only option is to change the test, not the implementation. CC Brent since he wrote this test. We could guard the reference count checks with #if !PLATFORM(GTK), or just remove the checks completely.
So all of the reference count checks in this test check the initial reference count, except for this one: EXPECT_GT(queue->refCount(), 1); m_testCompleted.wait(m_lock); But that one would fail (on any platform) if each task completes prior to the assertion being run.
Created attachment 252775 [details] Patch
Comment on attachment 252775 [details] Patch It doesn't look like we have the right behavior on iOS/mac with this change. :(
Comment on attachment 252775 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=252775&action=review > Tools/TestWebKitAPI/Tests/WTF/WorkQueue.cpp:71 > + EXPECT_GT(queue->refCount(), expectedInitialRefcount); I think I originally tried to have something like this, but ran into problems with race conditions.
Comment on attachment 252775 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=252775&action=review >> Tools/TestWebKitAPI/Tests/WTF/WorkQueue.cpp:71 >> + EXPECT_GT(queue->refCount(), expectedInitialRefcount); > > I think I originally tried to have something like this, but ran into problems with race conditions. Never mind -- this might be ok. It's just clang is unhappy with the difference in sign. I think refCount has a different type (maybe size_t?)
The refcount is an unsigned int, but it was being stored in a signed int local variable. I'll switch the local variable declaration to use auto instead. Hopefully that will placate clang.
Created attachment 252780 [details] Patch
So, in WorkQueue::platformInitialize() we create a thread and pass in a lambda that captured a RefPtr<WorkQueue>, essentially ensuring it doesn't get destroyed during that thread's lifetime. In createThread() the lambda (now wrapped in std::function<>) is stored on the heap, in the NewThreadContext object. That is again accessed on the newly-created thread, in threadEntryPoint(), moving the std::function<> off the heap, deleting the NewThreadContext, and passing the thread control to said std::function<> which is now spinning the WorkQueue's m_eventLoop. The kicker is that because a reference to the WorkQueue is held on the WorkQueue-owned thread, the WorkQueue and that thread don't get destroyed when the user drops his last reference to it, as he'd expect. We also don't interrupt the main loop anywhere else but the WorkQueue destructor (well, ::platformInvalidate()). I haven't confirmed this analysis in real world, but if it's correct, we're not destroying any WorkQueue and the underlying thread that we create.
(In reply to comment #8) > I haven't confirmed this analysis in real world, but if it's correct, we're > not destroying any WorkQueue and the underlying thread that we create. So this sounds like a memory leak, no?
As a sanity check I added this to WorkQueue.cpp: #ifndef NDEBUG #include <wtf/RefCountedLeakCounter.h> #endif DEFINE_DEBUG_ONLY_GLOBAL(WTF::RefCountedLeakCounter, workQueueCounter, ("WorkQueue")); Then browsed through a few pages on a debug build. We seem to be leaking everything except WorkQueues.
(In reply to comment #10) > As a sanity check I added this to WorkQueue.cpp: It wasn't a good sanity check because I forgot to increment and decrement the counter in the constructor/destructor. We leak 6 WorkQueues per WebView.
A solution that seems seductively simple: GRefPtr<GMainContext> eventContextProtector(m_eventContext.get()); GRefPtr<GMainLoop> mainLoopProtector(m_eventLoop.get()); m_workQueueThread = createThread(threadName, [eventContextProtector, mainLoopProtector] { g_main_context_push_thread_default(eventContextProtector.get()); g_main_loop_run(mainLoopProtector.get()); g_main_context_pop_thread_default(eventContextProtector.get()); }); Looks good to me...? This fixes the test, but only brings us down to 5 leaks per WebView, so we're still losing a bunch of threads. Aside: Is there a reason to pop the thread default main context immediately before the thread ends, Carlos? Aside: Thanks for the test, Brent.
(In reply to comment #8) > So, in WorkQueue::platformInitialize() we create a thread and pass in a > lambda that captured a RefPtr<WorkQueue>, essentially ensuring it doesn't > get destroyed during that thread's lifetime. Yes, but that was not the original intention, see bug #140998. We wanted to make sure that the thread was detached when the work queue is destroyed before the thread body has started. Since the WorkQueue was changed to be refcounted, every source takes a reference of the workQueue, so we don't really need to keep an extra reference for the whole like of the worker thread. I think the right solution could be to go back to the first version of the patch in bug #140998.
(In reply to comment #12) > A solution that seems seductively simple: > > GRefPtr<GMainContext> eventContextProtector(m_eventContext.get()); > GRefPtr<GMainLoop> mainLoopProtector(m_eventLoop.get()); > m_workQueueThread = createThread(threadName, [eventContextProtector, > mainLoopProtector] { > g_main_context_push_thread_default(eventContextProtector.get()); > g_main_loop_run(mainLoopProtector.get()); > g_main_context_pop_thread_default(eventContextProtector.get()); > }); > > Looks good to me...? This fixes the test, but only brings us down to 5 leaks > per WebView, so we're still losing a bunch of threads. > > Aside: Is there a reason to pop the thread default main context immediately > before the thread ends, Carlos? Actually, we don't even need to pop the context, since we push it right after creating the thread and the thread-specific context queue will be freed when the thread exits. > Aside: Thanks for the test, Brent.
(In reply to comment #14) > (In reply to comment #12) > > A solution that seems seductively simple: > > > > GRefPtr<GMainContext> eventContextProtector(m_eventContext.get()); > > GRefPtr<GMainLoop> mainLoopProtector(m_eventLoop.get()); > > m_workQueueThread = createThread(threadName, [eventContextProtector, > > mainLoopProtector] { > > g_main_context_push_thread_default(eventContextProtector.get()); > > g_main_loop_run(mainLoopProtector.get()); > > g_main_context_pop_thread_default(eventContextProtector.get()); > > }); > > > > Looks good to me...? This fixes the test, but only brings us down to 5 leaks > > per WebView, so we're still losing a bunch of threads. > > > > Aside: Is there a reason to pop the thread default main context immediately > > before the thread ends, Carlos? > > Actually, we don't even need to pop the context, since we push it right > after creating the thread and the thread-specific context queue will be > freed when the thread exits. We could also push/pop for every task that is run in the queue, but i think it's simpler to push right after creating the queue and never pop. > > Aside: Thanks for the test, Brent.
Created attachment 252812 [details] Patch This should fix the test and the work queue leak.
Carlos, I would change the first line of your commit message: "[GTK] WorkQueue runs forever" would be more descriptive. I tried to write a test to ensure the thread actually quits, but it's hard: detaching the thread in the destructor makes it impossible to join on it to ensure it completes, and counting threads in /proc is racy without a join, so we'd have to add new API to WorkQueue to change the behavior of the destructor just for such a test... and then we're not really testing the same functionality anymore. So I think we should be content with the existing refcount test.
Created attachment 252853 [details] Updated patch Zan pointed out that the thread could keep running forever if the work queue is released before the thread has run the main loop. This updated patch fixes that case by scheduling a main loop quit when the work queue is released before the thread has started. As Michael said, this is not easy to test, I ran some custom tests with printfs to verify the behaviour.
Comment on attachment 252853 [details] Updated patch View in context: https://bugs.webkit.org/attachment.cgi?id=252853&action=review > Source/WTF/wtf/gtk/WorkQueueGtk.cpp:72 > if (m_eventLoop) { This check shouldn't be necessary, we ASSERT the m_eventLoop existence in the constructor and don't destroy it anywhere.
(In reply to comment #19) > Comment on attachment 252853 [details] > Updated patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=252853&action=review > > > Source/WTF/wtf/gtk/WorkQueueGtk.cpp:72 > > if (m_eventLoop) { > > This check shouldn't be necessary, we ASSERT the m_eventLoop existence in > the constructor and don't destroy it anywhere. Yes, this is because in the past the platformInvalidate was not called by the destructor, but by invalidate method called by the user.
I'm going to leave the checks in the end, to ensure it keeps working even if the implementation changes again and platformInvalidate is moved somewhere else.
Committed r184072: <http://trac.webkit.org/changeset/184072>