RESOLVED FIXED 213200
REGRESSION(r262991): Broke stress/ensure-crash.js
https://bugs.webkit.org/show_bug.cgi?id=213200
Summary REGRESSION(r262991): Broke stress/ensure-crash.js
Michael Catanzaro
Reported 2020-06-15 10:53:52 PDT
Moving from bug #212479. stress/ensure-crash.js is broken since r262991 on Red Hat's internal cloop CI: Running stress/ensure-crash.js.default stress/ensure-crash.js.default: ERROR: Unexpected exit code: 134 FAIL: stress/ensure-crash.js.default Running stress/ensure-crash.js.mini-mode stress/ensure-crash.js.mini-mode: ERROR: Unexpected exit code: 134 FAIL: stress/ensure-crash.js.mini-mode 134 - 128 = 6 = SIGABRT on Linux, so the test is failing with SIGABRT. (SIGILL = 4 so if it were to exit with SIGILL its exit status would be 132.) Skimming the commit, I don't immediately see what could be causing a SIGABRT. I thought the most likely problem was that this change assumes HAVE(MACHINE_CONTEXT) is enabled, but looking in PlatformHave.h, that is not true on two of the four architectures we test (ppc64le and s390x). However, the test is failing in the same way on both x86_64 and aarch64, so that must not be the only problem here. I wonder if it might be related to ENABLE(SIGNAL_BASED_VM_TRAPS), which depends on DFG and is therefore disabled in cloop.
Attachments
proposed patch. (3.33 KB, patch)
2020-06-15 15:04 PDT, Mark Lam
mcatanzaro: review+
patch for landing. (4.33 KB, patch)
2020-06-15 15:23 PDT, Mark Lam
no flags
Patch (6.66 KB, patch)
2020-06-15 16:29 PDT, Michael Catanzaro
no flags
Michael Catanzaro
Comment 1 2020-06-15 11:13:46 PDT
Mark says bug #212543 might help.
Mark Lam
Comment 2 2020-06-15 13:11:51 PDT
(In reply to Michael Catanzaro from comment #1) > Mark says bug #212543 might help. I was wrong. 212542 won't help. But I think I know what's happening. Will provide details after I confirm things.
Mark Lam
Comment 3 2020-06-15 13:26:46 PDT
The root cause of this is because of https://trac.webkit.org/r238478 i.e. non-darwin ports chooses to implement CRASH() using abort.
Michael Catanzaro
Comment 4 2020-06-15 14:31:35 PDT
(In reply to Mark Lam from comment #3) > The root cause of this is because of https://trac.webkit.org/r238478 i.e. > non-darwin ports chooses to implement CRASH() using abort. We want CRASH() to raise SIGABRT because that's the expected way to express an assertion failure crash on Linux. Before that commit, it would have been SIGSEGV (not SIGILL).
Mark Lam
Comment 5 2020-06-15 14:32:31 PDT
(In reply to Michael Catanzaro from comment #4) > (In reply to Mark Lam from comment #3) > > The root cause of this is because of https://trac.webkit.org/r238478 i.e. > > non-darwin ports chooses to implement CRASH() using abort. > > We want CRASH() to raise SIGABRT because that's the expected way to express > an assertion failure crash on Linux. Before that commit, it would have been > SIGSEGV (not SIGILL). I understand. I'm investigating a fix now.
Mark Lam
Comment 6 2020-06-15 15:04:34 PDT
Created attachment 401944 [details] proposed patch.
Michael Catanzaro
Comment 7 2020-06-15 15:10:31 PDT
Comment on attachment 401944 [details] proposed patch. Thanks for investigating! The change looks good, but I don't think it will fully fix the crash because HAVE(MACHINE_CONTEXT) is false on two of the four tested architectures.
Mark Lam
Comment 8 2020-06-15 15:14:58 PDT
(In reply to Michael Catanzaro from comment #7) > Comment on attachment 401944 [details] > proposed patch. > > Thanks for investigating! > > The change looks good, but I don't think it will fully fix the crash because > HAVE(MACHINE_CONTEXT) is false on two of the four tested architectures. I think you'll just have to skip the tests that expect crashes (this with `//@ crashOK!` at the top) for those platforms. That is unless they can actually handle signals and for some reason is misclaiming that they !HAVE(MACHINE_CONTEXT).
Mark Lam
Comment 9 2020-06-15 15:23:19 PDT
Created attachment 401947 [details] patch for landing. Thanks for the review. I'm also going to unskip the ensure-crash.js test for armv7 and mips. Will let the EWS run on this patch before landing.
Michael Catanzaro
Comment 10 2020-06-15 16:20:37 PDT
(In reply to Mark Lam from comment #8) > I think you'll just have to skip the tests that expect crashes (this with > `//@ crashOK!` at the top) for those platforms. That is unless they can > actually handle signals and for some reason is misclaiming that they > !HAVE(MACHINE_CONTEXT). I can add per-architecture skips if need be, but hopefully there's a better way. :) TL;DR I think the affected code doesn't actually need to be guarded by HAVE(MACHINE_CONTEXT) at all? Signals work the same on all architectures. I can't imagine how broken things would be if the OS couldn't handle UNIX signals. :D PlatformHave.h has this: #if OS(DARWIN) || OS(FUCHSIA) || ((OS(FREEBSD) || defined(__GLIBC__) || defined(__BIONIC__)) && (CPU(X86) || CPU(X86_64) || CPU(ARM) || CPU(ARM64) || CPU(MIPS))) #define HAVE_MACHINE_CONTEXT 1 #endif mcontext is actually available on all architectures when the C library is glibc; however, since mcontext is architecture-specific, any code that wants to use it needs to be written separately for each desired architecture. I see JSC's MachineContext.h includes platform-specific code for many architectures, but there's no way to make this work for arbitrary architectures. I *personally* only care about a few specific architectures (x86_64, aarch64, ppc64le, and s390x), but since signal handling is not platform-specific and doesn't require mcontext, ideally we wouldn't require mcontext at all. Also, I know Igalia cares about non-glibc too; they use either ulibc or muslc (I can never remember which) on some embedded systems. (I think(?) that mcontext is implemented by both of these, but ucontext is probably not, and our MachineContext.h does depend on ucontext.) Anyway, it looks like the affected code in jsc.cpp does not depend on MachineContext.h at all. It only uses WTF's Signals.h, which is guarded by HAVE(MACHINE_CONTEXT), but I don't see why. Guess: it's guarded by HAVE(MACHINE_CONTEXT) because originally it was only needed for that case? I think we can just remove HAVE(MACHINE_CONTEXT) from the guard... or change it to #if !OS(WINDOWS) or something like that. Your patch still looks good though, don't hesitate to land. I'll cook a patch to change the guards tomorrow.
Michael Catanzaro
Comment 11 2020-06-15 16:22:08 PDT
(In reply to Michael Catanzaro from comment #10) > Your patch still looks good though, don't hesitate to land. I'll cook a > patch to change the guards tomorrow. Actually let me try uploading an alternate version of your patch with these guards changed and see if it passes EWS.
Michael Catanzaro
Comment 12 2020-06-15 16:29:57 PDT
Michael Catanzaro
Comment 13 2020-06-15 16:30:45 PDT
(In reply to Michael Catanzaro from comment #12) > Created attachment 401955 [details] > Patch Let me know what you think of this. I'll do a local build with HAVE(MACHINE_CONTEXT) disabled now to make sure it actually builds.
Mark Lam
Comment 14 2020-06-15 16:31:47 PDT
(In reply to Michael Catanzaro from comment #11) > (In reply to Michael Catanzaro from comment #10) > > Your patch still looks good though, don't hesitate to land. I'll cook a > > patch to change the guards tomorrow. > > Actually let me try uploading an alternate version of your patch with these > guards changed and see if it passes EWS. I don't think we should remove HAVE(MACHINE_CONTEXT). Even on unixes, there's no guarantee that the struct for the signal context is the same, right? The platform that opts in to HAVE(MACHINE_CONTEXT) must do the work to make it work first. It doesn't make sense to opt all unixes in blindly.
Mark Lam
Comment 15 2020-06-15 16:33:22 PDT
(In reply to Mark Lam from comment #14) > (In reply to Michael Catanzaro from comment #11) > > (In reply to Michael Catanzaro from comment #10) > > > Your patch still looks good though, don't hesitate to land. I'll cook a > > > patch to change the guards tomorrow. > > > > Actually let me try uploading an alternate version of your patch with these > > guards changed and see if it passes EWS. > > I don't think we should remove HAVE(MACHINE_CONTEXT). Even on unixes, > there's no guarantee that the struct for the signal context is the same, > right? The platform that opts in to HAVE(MACHINE_CONTEXT) must do the work > to make it work first. It doesn't make sense to opt all unixes in blindly. And please don't just obsolete my patch and replace it with yours. Now, my patch won't get the EWS validation that I put it there for.
Michael Catanzaro
Comment 16 2020-06-15 16:35:47 PDT
(In reply to Mark Lam from comment #14) > I don't think we should remove HAVE(MACHINE_CONTEXT). Even on unixes, > there's no guarantee that the struct for the signal context is the same, > right? The mcontext struct is going to be different on every architecture, but jsc.cpp and Signals.[cpp,h] don't use it anywhere, so it should be fine. (There is code in JSC that really does depend on mcontext, just not this code.) Sorry for interfering with your EWS run. :(
Michael Catanzaro
Comment 17 2020-06-15 16:43:52 PDT
Comment on attachment 401955 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=401955&action=review > Source/WTF/wtf/threads/Signals.h:28 > -#if USE(PTHREADS) && HAVE(MACHINE_CONTEXT) > +#if USE(PTHREADS) Note this code does not use pthreads either, so really these are not good guards... how about #if !OS(WINDOWS)? I bet the original usage of these classes in JSC depends on pthreads and mcontext; that's probably how these guards were originally selected.
Michael Catanzaro
Comment 18 2020-06-15 16:48:18 PDT
Comment on attachment 401955 [details] Patch Obsoleting my patch, Mark asked me to use a separate bug to change the guards.
Michael Catanzaro
Comment 19 2020-06-15 20:01:34 PDT
Comment on attachment 401947 [details] patch for landing. View in context: https://bugs.webkit.org/attachment.cgi?id=401947&action=review > Source/WTF/wtf/threads/Signals.h:53 > +#if !OS(DARWIN) > + Abort, > +#endif Looking this over again, the #if !OS(DARWIN) guards aren't really needed in Signals.h. The guards in jsc.cpp should suffice to avoid installing the signal handler on Apple systems. But even then, I guess it wouldn't hurt to install the SIGABRT handler unconditionally, since the code looks like it's trying to handle every possible crash signal, and it seems odd for only SIGABRT to be conspicuously missing. (Linux has a couple more crash signals, but they are truly obscure. Is SIGABRT really not common on macOS?)
Mark Lam
Comment 20 2020-06-15 20:08:31 PDT
Comment on attachment 401947 [details] patch for landing. View in context: https://bugs.webkit.org/attachment.cgi?id=401947&action=review >> Source/WTF/wtf/threads/Signals.h:53 >> +#endif > > Looking this over again, the #if !OS(DARWIN) guards aren't really needed in Signals.h. The guards in jsc.cpp should suffice to avoid installing the signal handler on Apple systems. But even then, I guess it wouldn't hurt to install the SIGABRT handler unconditionally, since the code looks like it's trying to handle every possible crash signal, and it seems odd for only SIGABRT to be conspicuously missing. (Linux has a couple more crash signals, but they are truly obscure. Is SIGABRT really not common on macOS?) I added it because we deliberately don't need or want this for OS(DARWIN). There should never be a test that generates a SIGABRT. Only non-DARWIN ports will ever encounter SIGABRT by their own choice. Alternatively, I can make it more restricted to Apple specific platforms. The purpose of the handler is not to catch all crashes, only crashes that we think we will generate deliberately. This is facility for testing that expected crash points will crash.
Mark Lam
Comment 21 2020-06-16 00:56:09 PDT
Comment on attachment 401947 [details] patch for landing. armv7 and mips bots are happy. Landing now.
Mark Lam
Comment 22 2020-06-16 00:59:26 PDT
*** Bug 213207 has been marked as a duplicate of this bug. ***
EWS
Comment 23 2020-06-16 01:32:51 PDT
Committed r263082: <https://trac.webkit.org/changeset/263082> All reviewed patches have been landed. Closing bug and clearing flags on attachment 401947 [details].
Radar WebKit Bug Importer
Comment 24 2020-06-16 01:33:19 PDT
Michael Catanzaro
Comment 25 2020-06-17 10:47:07 PDT
(In reply to Michael Catanzaro from comment #0) > Moving from bug #212479. stress/ensure-crash.js is broken since r262991 on > Red Hat's internal cloop CI I've verified this is now fixed with the combination of r263082, r263074, and r263123.
Note You need to log in before you can comment on or make changes to this bug.