RESOLVED FIXED 171159
REGRESSION(r215638): [Linux] Several worker tests are crashing in Thread::signalHandlerSuspendResume after r215638
https://bugs.webkit.org/show_bug.cgi?id=171159
Summary REGRESSION(r215638): [Linux] Several worker tests are crashing in Thread::sig...
Carlos Garcia Campos
Reported 2017-04-22 01:02:10 PDT
You can see backtraces in GTK+ bots: https://build.webkit.org/results/GTK Linux 64-bit Release (Tests)/r215663 (693)/results.html For example: https://build.webkit.org/results/GTK Linux 64-bit Release (Tests)/r215663 (693)/fast/workers/worker-document-leak-crash-log.txt
Attachments
Patch (2.67 KB, patch)
2017-04-22 03:33 PDT, Yusuke Suzuki
no flags
Patch (2.67 KB, patch)
2017-04-22 03:35 PDT, Yusuke Suzuki
mcatanzaro: review+
Yusuke Suzuki
Comment 1 2017-04-22 02:19:23 PDT
Super curious thing is that there is no user of calling Thread::suspend / Thread::resume. I'm still investigating.
Yusuke Suzuki
Comment 2 2017-04-22 02:23:04 PDT
I guess this is related to bug 170976. Before starting using signaling APIs, WTF in Linux uses SIGUSR2 to suspend and resume threads. I guess this suspend/resume handler is incorrectly routed from jscSignalHandler. Investigating.
Yusuke Suzuki
Comment 3 2017-04-22 02:25:16 PDT
We should use Signaling APIs to set up suspend / resume signals I guess.
Yusuke Suzuki
Comment 4 2017-04-22 02:28:00 PDT
OK, signaling API uses SIGUSR2 for there purpose, but it is already used in Linux to suspend and resume threads.
Yusuke Suzuki
Comment 5 2017-04-22 03:20:49 PDT
(In reply to Yusuke Suzuki from comment #4) > OK, signaling API uses SIGUSR2 for there purpose, but it is already used in > Linux to suspend and resume threads. suspend / resume is a bit tricky operation. We should use dedicated signal for this operation. Now SIGUSR2 is used by ThreadMessage. But SIGUSR1 becomes empty. We should use it instead.
Yusuke Suzuki
Comment 6 2017-04-22 03:33:08 PDT
Yusuke Suzuki
Comment 7 2017-04-22 03:35:04 PDT
Michael Catanzaro
Comment 8 2017-04-22 06:28:07 PDT
Comment on attachment 307905 [details] Patch I don't think it's possible to use UNIX signals for any purpose without introducing subtle race conditions that are extremely hard to detect or reason about. Is it possible to eventually eliminate our use of UNIX signals for thread suspension, or is there just no other way to accomplish this than by using signals?
Yusuke Suzuki
Comment 9 2017-04-22 06:54:12 PDT
(In reply to Michael Catanzaro from comment #8) > Comment on attachment 307905 [details] > Patch > > I don't think it's possible to use UNIX signals for any purpose without > introducing subtle race conditions that are extremely hard to detect or > reason about. Is it possible to eventually eliminate our use of UNIX signals > for thread suspension, or is there just no other way to accomplish this than > by using signals? If we want to have the identical mechanism (ability to suspend and resume threads), I believe we do not have a non-signal way in Linux. For non Linux env, sometimes an alternative is offered, like pthread_suspend_np. But it also has a problem that we cannot retrieve register sets from a suspended thread. If we do not want to have the current strong mechanism, the simplest way is something like safepoint approach. Maybe current CheckTrap op can be used. safepoint repeatedly appears in JIT code and JS mutators will park if the ccollector requires. Then, the stop the world op works. But to make it efficiently implemented, yet it involves SEGV signal handler :P Wihout it, I think it's not as efficient as is...
Yusuke Suzuki
Comment 10 2017-04-22 07:07:38 PDT
Filip Pizlo
Comment 11 2017-04-23 08:37:36 PDT
I have some thoughts: - MachineStackMarker should switch to using this new mechanism. You would use the functional message API to send a Function that scans the stack. We should test whether this is perf-neutral on macOS/iOS/etc, and if it is, then we should get rid of the thread_suspend/resume code and just use this. It's better if all OSes use the same approach because then we share test coverage, ideas, etc. We should only use different APIs on different OSes if it gives us some measurable progression. - You don't need the full CheckTraps mechanism, since that's more of a safepoint - you stop at a place where the JIT is able to vend additional information. We don't really need safepoints for stack scanning. One thread holds the JSLock, and that thread does need to be safepointed but it's already at a safepoint anytime the GC requests a stack scan. None of the other threads that MachineStackMarker looks at hold the JSLock and so they don't need to be safepointed. In fact, all we really need is a ragged safepoint, where the thread requesting scanning signals all of the threads and then waits for them to return with stack scan results. (In reply to Michael Catanzaro from comment #8) > Comment on attachment 307905 [details] > Patch > > I don't think it's possible to use UNIX signals for any purpose without > introducing subtle race conditions that are extremely hard to detect or > reason about. That's not true. Two sound uses are: - pthread_kill, as we are doing. - handling a signal that originated as a CPU trap, like SEGV, BUS, TRAP, ILL, FPE, etc. We use signal handling for those purposes. > Is it possible to eventually eliminate our use of UNIX signals > for thread suspension, or is there just no other way to accomplish this than > by using signals? It's not possible.
Yusuke Suzuki
Comment 12 2017-04-23 09:45:20 PDT
Interesting. (In reply to Filip Pizlo from comment #11) > I have some thoughts: > > - MachineStackMarker should switch to using this new mechanism. You would > use the functional message API to send a Function that scans the stack. We > should test whether this is perf-neutral on macOS/iOS/etc, and if it is, > then we should get rid of the thread_suspend/resume code and just use this. > It's better if all OSes use the same approach because then we share test > coverage, ideas, etc. We should only use different APIs on different OSes > if it gives us some measurable progression. From the point of cross-platform view, one obstacle is that ThreadMessage is not implemented in Windows since it does not have identical mechanism to pthread_kill. I'm not sure there is a good mechanism in Windows to implement such semantics. > - You don't need the full CheckTraps mechanism, since that's more of a > safepoint - you stop at a place where the JIT is able to vend additional > information. We don't really need safepoints for stack scanning. One > thread holds the JSLock, and that thread does need to be safepointed but > it's already at a safepoint anytime the GC requests a stack scan. None of > the other threads that MachineStackMarker looks at hold the JSLock and so > they don't need to be safepointed. In fact, all we really need is a ragged > safepoint, where the thread requesting scanning signals all of the threads > and then waits for them to return with stack scan results. Right, make sense. Mutator already has JSLock when performing GC, and other ones do not have JSLock for that VM. It sounds like similar to TLB shootdown implementation. One thread broadcasts the request for the stack scan, threads perform it and return the result. The requester waits for the results, then, all the threads start executing. > (In reply to Michael Catanzaro from comment #8) > > Comment on attachment 307905 [details] > > Patch > > > > I don't think it's possible to use UNIX signals for any purpose without > > introducing subtle race conditions that are extremely hard to detect or > > reason about. > > That's not true. Two sound uses are: > > - pthread_kill, as we are doing. > > - handling a signal that originated as a CPU trap, like SEGV, BUS, TRAP, > ILL, FPE, etc. > > We use signal handling for those purposes. > > > Is it possible to eventually eliminate our use of UNIX signals > > for thread suspension, or is there just no other way to accomplish this than > > by using signals? > > It's not possible.
Note You need to log in before you can comment on or make changes to this bug.