Bug 220641

Summary: Add JSC API configuring GC signals in Linux
Product: WebKit Reporter: George Pittarelli <gjp>
Component: JavaScriptCoreAssignee: Yusuke Suzuki <ysuzuki>
Status: RESOLVED FIXED    
Severity: Normal CC: benjamin, cdumez, cmarcelo, ews-watchlist, keith_miller, mark.lam, msaboff, saam, tzagallo, webkit-bug-importer, ysuzuki
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Linux   
See Also: https://bugs.webkit.org/show_bug.cgi?id=223069
Bug Depends on:    
Bug Blocks: 221081    
Attachments:
Description Flags
Patch
ews-feeder: commit-queue-
Patch
mark.lam: review+
Patch
none
Patch none

Description George Pittarelli 2021-01-14 16:44:21 PST
JSC uses SIGUSR1 to suspend/resume threads on Linux. However, this signal handler can segfault if a process using JSC receives a SIGUSR1 at a very particular time: after JSC initialization (i.e. when the signal handler is installed), but before the first time that JSC actually does a Thread suspend.

I believe (from stack traces) that this is coming from this line: https://trac.webkit.org/browser/webkit/trunk/Source/WTF/wtf/posix/ThreadingPOSIX.cpp?rev=270806#L121

Specifically, the `targetThread` global is still nullptr in this situation. Perhaps a null check will suffice as a fix?
Comment 1 Yusuke Suzuki 2021-01-15 11:00:29 PST
This is a nice catch!
Comment 2 Yusuke Suzuki 2021-01-15 11:04:53 PST
I think JSC is not considering about the pattern emitting SIGUSR1 without interacting with JSC code, but just adding nullptr check is free, and if it can fix some usage, let's just add it.
Comment 3 Yusuke Suzuki 2021-01-15 11:28:46 PST
(In reply to George Pittarelli from comment #0)
> JSC uses SIGUSR1 to suspend/resume threads on Linux. However, this signal
> handler can segfault if a process using JSC receives a SIGUSR1 at a very
> particular time: after JSC initialization (i.e. when the signal handler is
> installed), but before the first time that JSC actually does a Thread
> suspend.
> 
> I believe (from stack traces) that this is coming from this line:
> https://trac.webkit.org/browser/webkit/trunk/Source/WTF/wtf/posix/
> ThreadingPOSIX.cpp?rev=270806#L121
> 
> Specifically, the `targetThread` global is still nullptr in this situation.
> Perhaps a null check will suffice as a fix?

I looked into it deeply, and I have some thoughts.

Can you share your story about why emitting SIGUSR1?
To handle your use case better, we need know how you are using SIGUSR1.

In JSC on Linux, JSC is not assuming SIGUSR1 is emitted outside of JSC.
JSC strongly assumes that targetThread is filled with argument thread before the signal is invoked, and this invariant is kept in JSC code. So, JSC will never see nullptr in targetThread without emitting SIGUSR1 outside of JSC.

Even though we insert null check against targetThread, it is still possible that,

1. targetThread is filled with already-freed Thread pointer (since JSC does not clear it after using, and it is still possible)
2. Conflict with JSC's suspension mechanism (like, sending this signal while JSC is about to sending this signal to suspend).

Several ways can be considered.

1. If you are using Linux, and if you are not using pthread_sigqueue to emit SIGUSR1, then JSC can switch to use pthread_sigqueue to emit signal in Linux, and check `if (info->si_code == SI_QUEUE)` to make signal sender narrower.
2. If you are sending a signal outside of the process using JSC, JSC can use si_pid to check wether the signal is sent by own process, and exclude handling.
Comment 4 George Pittarelli 2021-01-15 15:10:26 PST
Hi Yusuke, thank you for the investigation!

> Can you share your story about why emitting SIGUSR1?
> To handle your use case better, we need know how you are using SIGUSR1.

We have a binary that uses JSC, among other libraries. One of those other libraries also sets a SIGUSR1 handler, but that handler is effectively a no-op when triggered unintentionally. That handler is triggered from another process at arbitrary times. (Yes, there's a performance question here of JSC's SIGUSR1 triggering that other handler repeatedly, but we're addressing that separately.)

> 1. If you are using Linux, and if you are not using pthread_sigqueue to emit SIGUSR1, then JSC can switch to use pthread_sigqueue to emit signal in Linux, and check `if (info->si_code == SI_QUEUE)` to make signal sender narrower.
> 2. If you are sending a signal outside of the process using JSC, JSC can use si_pid to check wether the signal is sent by own process, and exclude handling.

Option 2 would solve our particular use case.

I worry about the other case: in general, what if someone tries to use JSC alongside another library that uses SIGUSR1 in the same way that JSC does? That said, I also don't see a 100% perfect fix. Is there any reason not to do both options? And if sigqueue is used, perhaps you could narrow even further with si_value?
Comment 5 Yusuke Suzuki 2021-01-15 15:50:01 PST
(In reply to George Pittarelli from comment #4)
> Hi Yusuke, thank you for the investigation!
> 
> > Can you share your story about why emitting SIGUSR1?
> > To handle your use case better, we need know how you are using SIGUSR1.
> 
> We have a binary that uses JSC, among other libraries. One of those other
> libraries also sets a SIGUSR1 handler, but that handler is effectively a
> no-op when triggered unintentionally. That handler is triggered from another
> process at arbitrary times. (Yes, there's a performance question here of
> JSC's SIGUSR1 triggering that other handler repeatedly, but we're addressing
> that separately.)
> 
> > 1. If you are using Linux, and if you are not using pthread_sigqueue to emit SIGUSR1, then JSC can switch to use pthread_sigqueue to emit signal in Linux, and check `if (info->si_code == SI_QUEUE)` to make signal sender narrower.
> > 2. If you are sending a signal outside of the process using JSC, JSC can use si_pid to check wether the signal is sent by own process, and exclude handling.
> 
> Option 2 would solve our particular use case.
> 
> I worry about the other case: in general, what if someone tries to use JSC
> alongside another library that uses SIGUSR1 in the same way that JSC does?
> That said, I also don't see a 100% perfect fix. Is there any reason not to
> do both options? And if sigqueue is used, perhaps you could narrow even
> further with si_value?

Unfortunately, in Linux, signaling is the most reliable way to suspend and resume the thread, and it is used in many concurrent GCes. So, I think no perfect solution exists unfortunately.
JSC uses suspend and resume mechanism to drive concurrent GC. So, at least one signal is required in Linux and BSD (IIRC, BSD has a way to suspend and resume threads, but they do not have a way to extract machine context of the suspended threads. Darwin and Windows have a way to suspend and resume threads, plus, extracting machine context of suspended threads).
Currently, we are using SIGUSR1. Possibly, we could offer a way to configure what signal will be used.
Or we can pick a signal that is more extremely rarely used one instead. For example, in Linux, realtime signals can be used for application purpose but they are not typically used in application.
Maybe, picking one of realtime signal would be better for application compatibility.

What do you think of?
Comment 6 Yusuke Suzuki 2021-01-15 15:55:14 PST
Yeah, possibly, using one of real-time signal would be better for Linux.
Since real-time signals are not hardcoded, good application can search to find an appropriate signal for their purpose. So, JSC can also find one of real-time signal and use it.
Comment 7 Yusuke Suzuki 2021-01-15 18:56:56 PST
(In reply to Yusuke Suzuki from comment #6)
> Yeah, possibly, using one of real-time signal would be better for Linux.
> Since real-time signals are not hardcoded, good application can search to
> find an appropriate signal for their purpose. So, JSC can also find one of
> real-time signal and use it.

Ah, unfortunately, no. real-time signals are always queued even if we do not use pthread_sigqueue. It is not queued if we use `kill`, but (I think this is not great) if we use pthread_kill, then this is queued. And queue size is limited, so we could have a fun issue.

I think we should try using SIGPWR or SIGXCPU.
https://superuser.com/questions/1185603/what-can-trigger-a-sigpwr-signal-that-interrupts-a-sendmsg-system-call
Comment 8 Yusuke Suzuki 2021-01-15 19:52:50 PST
Any signals can potentially conflict. So let's just use SIGUSR1 for now, and putting a function that allows embedders can change.
Comment 9 Yusuke Suzuki 2021-01-15 19:53:10 PST
Since there is no way in Linux, I think this is the best path.
Comment 10 Yusuke Suzuki 2021-01-15 20:56:17 PST
Created attachment 417759 [details]
Patch
Comment 11 Yusuke Suzuki 2021-01-15 21:10:27 PST
Created attachment 417760 [details]
Patch
Comment 12 Mark Lam 2021-01-16 15:18:38 PST
Comment on attachment 417760 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=417760&action=review

r=me with suggested improvement.

> Source/WTF/wtf/posix/ThreadingPOSIX.cpp:80
> +int sigThreadSuspendResume { -1 };
> +bool sigThreadSuspendResumeIsFrozen { false };

Why not store these as fields in WTFConfig where they can actually get frozen?
Comment 13 Yusuke Suzuki 2021-01-16 22:49:39 PST
Comment on attachment 417760 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=417760&action=review

>> Source/WTF/wtf/posix/ThreadingPOSIX.cpp:80
>> +bool sigThreadSuspendResumeIsFrozen { false };
> 
> Why not store these as fields in WTFConfig where they can actually get frozen?

It makes initialization code a bit complicated (we need to insert WTF::initialize() in TLS initialization) and we need another two flags because WTFConfig cannot initialize field with -1, but changed.
Comment 14 Yusuke Suzuki 2021-01-16 22:51:28 PST
Created attachment 417775 [details]
Patch
Comment 15 Yusuke Suzuki 2021-01-16 22:56:08 PST
Created attachment 417776 [details]
Patch
Comment 16 Yusuke Suzuki 2021-01-17 00:50:55 PST
Committed r271560: <https://trac.webkit.org/changeset/271560>
Comment 17 Radar WebKit Bug Importer 2021-01-17 00:51:13 PST
<rdar://problem/73292189>
Comment 18 Yusuke Suzuki 2021-01-17 00:59:29 PST
@George

Added `JSConfigureSignalForGC` SPI (not an API yet), so for now, you can use it by including `JSBasePrivate.h`. We could promote it to API later.
By default, JSC uses SIGUSR1. And embedder can select different signal if an embedder is using it e.g. SIGPWR, SIGXCPU etc. (for example, dotnet (mono), BoehmGC uses SIGPWR / SIGXCPU. Hotspot VM uses SIGUSR2.)
Call this function before initializing any part of JSC. After JSC initialization, JSConfigureSignalForGC fails (returning false).
Comment 19 Yusuke Suzuki 2021-01-17 01:23:28 PST
(In reply to Yusuke Suzuki from comment #18)
> @George
> 
> Added `JSConfigureSignalForGC` SPI (not an API yet), so for now, you can use
> it by including `JSBasePrivate.h`. We could promote it to API later.
> By default, JSC uses SIGUSR1. And embedder can select different signal if an
> embedder is using it e.g. SIGPWR, SIGXCPU etc. (for example, dotnet (mono),
> BoehmGC uses SIGPWR / SIGXCPU. Hotspot VM uses SIGUSR2.)
> Call this function before initializing any part of JSC. After JSC
> initialization, JSConfigureSignalForGC fails (returning false).

Maybe, we also add environment variable thing too later.