Bug 233980

Summary: A dedicated worker is not frozen even when the page is in back/forward cache
Product: WebKit Reporter: Hajime Hoshi <hajimehoshi>
Component: WebKit Misc.Assignee: Nobody <webkit-unassigned>
Status: NEW ---    
Severity: Normal CC: achristensen, ap, cdumez, fpizlo, ggaren, mark.lam, nham, webkit-bug-importer, youennf, ysuzuki
Priority: P2 Keywords: InRadar
Version: Safari 15   
Hardware: Unspecified   
OS: Unspecified   

Description Hajime Hoshi 2021-12-07 22:33:32 PST
Reproducing steps:

1. Open https://hajimehoshi.github.io/sandbox/worker/
2. See the numbers for a while
  There are two counters. The upper is counted by the main frame and the lower is counted by a dedicated worker.
3. Click 'Google.com'
4. Stay for a while (10 seconds or so)
5. Go back to the original page. The worker counter is increased for the time duration of being in google.com.

Source code: https://github.com/hajimehoshi/sandbox/tree/main/worker
Comment 1 Chris Dumez 2021-12-09 10:39:48 PST
The issue is that this worker thread is busy looping and thus never processing the message from the main thread asking it to suspend.
Comment 2 Geoffrey Garen 2021-12-09 10:43:08 PST
Maybe the suspend message should kill the worker if there's no timely reply.
Comment 3 Chris Dumez 2021-12-09 10:44:52 PST
(In reply to Geoffrey Garen from comment #2)
> Maybe the suspend message should kill the worker if there's no timely reply.

We could do that. I was thinking about calling WTF::Thread::suspend() instead of posting a message to the thread. Any opposition to that?
Comment 4 Alexey Proskuryakov 2021-12-09 10:48:23 PST
Would it suspend threads holding mutexes, and freeze the main thread as a consequence?
Comment 5 Chris Dumez 2021-12-09 10:49:56 PST
(In reply to Alexey Proskuryakov from comment #4)
> Would it suspend threads holding mutexes, and freeze the main thread as a
> consequence?

You're right, I think that would be a risk indeed and very likely the reason why our suspension logic currently works the way it does.

I guess timeout + kill is the way to go then.
Comment 6 Geoffrey Garen 2021-12-09 11:00:18 PST
In my suggestion to "kill the worker", I had assumed this was a ServiceWorker. But I see now that it's just a Worker.

It's an interesting edge case because there's no plain definition of "kill". Should we exit the process and possibly take down other pages with us?

I wonder if the right behavior is just to wait synchronously for the worker to terminate. Then, if the worker is hung, you get the same behavior you would have gotten if the main thread were hung. In practice, that might often trigger some kind of watchdog kill. Or something else.

It seems to me that modeling this like a hang of the main thread might be the most accurate thing, if that's possible.
Comment 7 Alexey Proskuryakov 2021-12-09 11:20:29 PST
There are other cases where dedicated workers are supposed to be paused or terminated (see https://html.spec.whatwg.org/multipage/workers.html#terminate-a-worker and around). E.g. I don't think that we are allowed to let it spin once it's no longer observable in an active page either.

I think that it's mostly implemented in WebKit, but have been out of the loop for too long to point it out.
Comment 8 Radar WebKit Bug Importer 2021-12-09 13:13:20 PST
<rdar://problem/86287415>
Comment 9 Yusuke Suzuki 2021-12-09 13:32:42 PST
(In reply to Chris Dumez from comment #5)
> (In reply to Alexey Proskuryakov from comment #4)
> > Would it suspend threads holding mutexes, and freeze the main thread as a
> > consequence?
> 
> You're right, I think that would be a risk indeed and very likely the reason
> why our suspension logic currently works the way it does.
> 
> I guess timeout + kill is the way to go then.

Yes. One example is that malloc can have a global mutex for the slow path (e.g. stealing a page from the global heap instead of thread local heap). If we suspend a thread holding that mutex, most likely, soon, the other threads will be blocked once malloc happens.
Comment 10 Mark Lam 2021-12-09 13:39:14 PST
(In reply to Yusuke Suzuki from comment #9)
> (In reply to Chris Dumez from comment #5)
> > (In reply to Alexey Proskuryakov from comment #4)
> > > Would it suspend threads holding mutexes, and freeze the main thread as a
> > > consequence?
> > 
> > You're right, I think that would be a risk indeed and very likely the reason
> > why our suspension logic currently works the way it does.
> > 
> > I guess timeout + kill is the way to go then.
> 
> Yes. One example is that malloc can have a global mutex for the slow path
> (e.g. stealing a page from the global heap instead of thread local heap). If
> we suspend a thread holding that mutex, most likely, soon, the other threads
> will be blocked once malloc happens.

We can use a mechanism similar to global GC's stop thread mechanism (doesn't exist yet) and stop the target thread at a GC consistent point.  Doing so ensure that the target thread won't be in the middle of any malloc operation.  That said, it doesn't guarantee yet that there won't be other locks that could be held.

If suspension is something we think we need, I can consider it in my GC work.  I think with a some work, it is achievable.
Comment 11 Geoffrey Garen 2021-12-09 13:50:50 PST
Technically the web standard makes reference to a behavior much more like JSC::Watchdog / JSC::VM::throwTerminationException().

For this case, I think just waiting for termination, combined with our process model, will get the job done.

For the more challenging case of explicitly calling Worker.terminate() in the middle of an infinite loop, I guess we'll eventually need to adopt / add support for JSC::VM::throwTerminationException() in Worker scripts.
Comment 12 Alexey Proskuryakov 2021-12-09 14:30:24 PST
(In reply to Geoffrey Garen from comment #11)
> Technically the web standard makes reference to a behavior much more like
> JSC::Watchdog / JSC::VM::throwTerminationException().

Exactly what I've been vaguely recalling, yes - see WorkerOrWorkletScriptController::scheduleExecutionTermination().
Comment 13 Mark Lam 2021-12-09 14:40:40 PST
(In reply to Geoffrey Garen from comment #11)
> For the more challenging case of explicitly calling Worker.terminate() in
> the middle of an infinite loop, I guess we'll eventually need to adopt / add
> support for JSC::VM::throwTerminationException() in Worker scripts.

We already do: Worker.terminate() uses the VMTraps mechanism, which allows a worker to terminate while in an infinite loop.
Comment 14 Geoffrey Garen 2021-12-09 14:42:06 PST
Maybe it's a simple fix just to use VMTraps for shutdown too?