Summary: | [GLib] Leaked RunLoop objects on worker threads | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Zan Dobersek <zan> | ||||||||
Component: | New Bugs | Assignee: | Zan Dobersek <zan> | ||||||||
Status: | RESOLVED FIXED | ||||||||||
Severity: | Normal | CC: | aperez, benjamin, cdumez, cgarcia, cmarcelo, ews-watchlist, ggaren, Hironori.Fujii, webkit-bug-importer, ysuzuki | ||||||||
Priority: | P2 | Keywords: | InRadar | ||||||||
Version: | WebKit Nightly Build | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
See Also: |
https://bugs.webkit.org/show_bug.cgi?id=219591 https://bugs.webkit.org/show_bug.cgi?id=219593 https://bugs.webkit.org/show_bug.cgi?id=221115 |
||||||||||
Attachments: |
|
Description
Zan Dobersek
2020-11-21 00:44:27 PST
Created attachment 414759 [details]
RunLoop::Holder cleanup
One possible fix to this issue is explicitly clearing out the dispatch queues in RunLoop::Holder destructor, which is called when the TLS storage for this thread-specific RunLoop is being destroyed.
Created attachment 414975 [details]
Patch
Comment on attachment 414975 [details]
Patch
Nice catch!
This kind of feels like the wrong level at which to fix this bug.
I wonder if TimerBase can hold a weak pointer to RunLoop instead.
I guess we have a few related problems: 1. DispatchTimer retains RunLoop. If a thread exits before a scheduled dispatchAfter fires, we leak. 2. RunLoop::dispatch() lambdas may retain anything. If a thread exits before a dispatch() lambda dequeues, we leak. 3. The function body of a timer or dispatch may be responsible for releasing something. If a thread exits before a function body executes, we leak. (This is the general case of #1.) For #1, maybe RunLoop should keep an explicit retained set of DispatchTimers waiting to fire (instead of DispatchTimer owning itself). dispatchAfter() would add to the set, and firing the timer would remove from the set. That way, if the RunLoop is destroyed, all DispatchTimers in the set are freed. For #2, I guess it's right to clear the queues. But can move that clearing into a member function named something like RunLoop::threadWillExit()? I'm torn about whether we should clear the queues or ASSERT that they're already empty. See below. For #3, maybe there's nothing we can do about it, and it's just programmer error to allow a thread to exit when it has more work to do. I think DispatchTimer problem would be resolved if queues were emptied out upon thread destruction and the DispatchTimer ownership was tied to the lambda lifetime and not its execution. I'm not sure how much people could be blamed for dispatches still pending at the point of thread determination. It's certainly problematic if the queued dispatches were supposed to manually release resources that otherwise weren't tied to the dispatch functor lifetime. There could be a helper class made available, instances of which could be included in lambdas and would have to be manually marked upon dispatch, and asserting upon destruction of those instances if marking was not done. (In reply to Geoffrey Garen from comment #4) > Comment on attachment 414975 [details] > Patch > > Nice catch! > > This kind of feels like the wrong level at which to fix this bug. > > I wonder if TimerBase can hold a weak pointer to RunLoop instead. A WeakPtr would work for the GLib implementation. Beyond the TimerBase constructor, the only place where RunLoop is used is during the dispatch of the RunLoop::dispatch() source, which wouldn't happen after the RunLoop is already destroyed. Another option is to remove the m_runLoop member entirely from TimberBase, and instead require the caller to pass in a RunLoop when scheduling a timer into a RunLoop. That seems to be the way that the CoreFoundation API avoids a circular reference between timer and runloop. > I think DispatchTimer problem would be resolved if queues were emptied out
> upon thread destruction and the DispatchTimer ownership was tied to the
> lambda lifetime and not its execution.
I don't think emptying WTF::RunLoop queues is sufficient to address DispatchTimer because DispatchTimer is not in any WTF::RunLoop queue; instead it is registered with lower-level OS timer primitives (GSource in the Glib case).
(In reply to Geoffrey Garen from comment #9) > > I think DispatchTimer problem would be resolved if queues were emptied out > > upon thread destruction and the DispatchTimer ownership was tied to the > > lambda lifetime and not its execution. > > I don't think emptying WTF::RunLoop queues is sufficient to address > DispatchTimer because DispatchTimer is not in any WTF::RunLoop queue; > instead it is registered with lower-level OS timer primitives (GSource in > the Glib case). Yes, this is correct. I was wrong before. Created attachment 415284 [details]
Patch
(In reply to Zan Dobersek from comment #11) > Created attachment 415284 [details] > Patch This still only addresses the original bug, i.e. case #2. It now adds and uses RunLoop::threadWillExit(), and a unit test is provided. Removing (In reply to Geoffrey Garen from comment #8) > Another option is to remove the m_runLoop member entirely from TimberBase, > and instead require the caller to pass in a RunLoop when scheduling a timer > into a RunLoop. That seems to be the way that the CoreFoundation API avoids > a circular reference between timer and runloop. At least for GLib, it could be possible to drop the RunLoop reference from RunLoop::TimerBase. We don't need it beyond the GSource creation in the constructor. Comment on attachment 415284 [details]
Patch
r=me
Would be nice to do a follow-up for timers. Still, this is clearly an improvement over the status quo.
I'll be opening separate bugs for the general dispatchAfter() leaks and the RunLoop ref removal from TimerBase. Committed r270496: <https://trac.webkit.org/changeset/270496> All reviewed patches have been landed. Closing bug and clearing flags on attachment 415284 [details]. Comment on attachment 415284 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=415284&action=review > Source/WTF/wtf/RunLoop.cpp:177 > + m_nextIteration.clear(); This causes a problem. Filed: Bug 221115 – RunLoop::threadWillExit is doing m_nextIteration.clear() without locking m_nextIterationLock |