Bug 221129

Summary: Flaky JSC test: stress/shared-array-buffer-sort-while-different-thread-is-modifying.js.default
Product: WebKit Reporter: Michael Catanzaro <mcatanzaro>
Component: JavaScriptCoreAssignee: Michael Catanzaro <mcatanzaro>
Status: RESOLVED FIXED    
Severity: Normal CC: ews-watchlist, fpizlo, keith_miller, mark.lam, mcatanzaro, msaboff, saam, tzagallo, webkit-bug-importer, ysuzuki
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: PC   
OS: Linux   
See Also: https://bugzilla.redhat.com/show_bug.cgi?id=1917837
Attachments:
Description Flags
Patch
none
Patch
ews-feeder: commit-queue-
Patch saam: review+, ews-feeder: commit-queue-

Description Michael Catanzaro 2021-01-29 06:30:13 PST
JSC stress test is flaky on Red Hat's internal CI (which runs with JIT disabled and cloop enabled). The first known failure was Dec 13. I didn't report it until now because until earlier this week, all the failures occurred on ppc64le, so I incorrectly assumed the issue was specific to that architecture. But the test has since failed on both s390x and x86_64 as well, so it's a more general problem.

Unfortunately, the error message is pretty generic:

Running stress/shared-array-buffer-sort-while-different-thread-is-modifying.js.default
stress/shared-array-buffer-sort-while-different-thread-is-modifying.js.default: Exception: JavaScript execution terminated.
stress/shared-array-buffer-sort-while-different-thread-is-modifying.js.default: ERROR: Unexpected exit code: 1
FAIL: stress/shared-array-buffer-sort-while-different-thread-is-modifying.js.default

I don't know what to make of "JavaScript execution terminated," but it's this same error on all architectures.

Based on the failure history, it might be due to some change that landed in trunk during early December:

Dec 13: failed on ppc64le
Jan 5: failed on ppc64le
Jan 8: failed on ppc64le
Jan 19: failed on ppc64le
Jan 26: failed on s390x
Jan 29: failed on s390x and x86_64
Comment 1 Radar WebKit Bug Importer 2021-02-05 06:31:14 PST
<rdar://problem/74023276>
Comment 2 Michael Catanzaro 2021-02-09 08:02:52 PST
Created attachment 419715 [details]
Patch
Comment 3 Michael Catanzaro 2021-02-09 08:06:08 PST
(In reply to Michael Catanzaro from comment #0)
> Based on the failure history, it might be due to some change that landed in
> trunk during early December:
> 
> Dec 13: failed on ppc64le
> Jan 5: failed on ppc64le
> Jan 8: failed on ppc64le
> Jan 19: failed on ppc64le
> Jan 26: failed on s390x
> Jan 29: failed on s390x and x86_64

Feb 2: failed on s390x
Feb 6: failed on x86_64
Comment 4 Michael Catanzaro 2021-02-16 14:16:35 PST
Sadly this is going to be very difficult to track down. I am currently running this test in a continuous loop on (a) my desktop, and (b) the s390x server that we use for our internal CI (since it seems to fail more often on this server than it does for x86_64). Sadly the test passes every time. Since it is a timing issue, I bet it only fails when the tests are run all at once instead of one at a time. I was hoping to bisect it, but it doesn't fail often enough for it to be practical to bisect in this way.
Comment 5 Michael Catanzaro 2021-03-02 07:40:07 PST
This test is still flaky on all architectures:

Feb 13: failed on ppc64le
Feb 14: failed on s390x
Feb 15: failed on s390x
Feb 19: failed on ppc64le
Feb 20: failed on aarch64
March 1: failed on ppc64le
Comment 6 Michael Catanzaro 2021-03-02 07:41:28 PST
Comment on attachment 419715 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=419715&action=review

> JSTests/stress/shared-array-buffer-sort-while-different-thread-is-modifying.js:3
> +//@ skip
>  //@ runDefault("--watchdog=1000", "--watchdog-exception-ok")
> +// FIXME: unskip this test when https://bugs.webkit.org/show_bug.cgi?id=221129 is fixed.

Can we land this...?
Comment 7 Yusuke Suzuki 2021-03-02 11:12:22 PST
@Michael Thanks. Could you paste the crash trace of the failure?
Comment 8 Michael Catanzaro 2021-03-02 11:41:51 PST
Unfortunately I don't think it's crashing because the process exit status is 1. If there was a crash, the exit status would be (128 + signal value). So we don't have much to go on other than the stdout/stderr in comment #0.
Comment 9 Yusuke Suzuki 2021-03-02 15:29:25 PST
OK, "JavaScript execution terminated" this sounds like this is not a real bug. Possibly, the thread is terminated, and we are not properly handling that case in jsc shell...?
Comment 10 Yusuke Suzuki 2021-03-02 16:15:06 PST
Created attachment 422016 [details]
Patch
Comment 11 Yusuke Suzuki 2021-03-02 16:16:48 PST
Created attachment 422019 [details]
Patch
Comment 12 Michael Catanzaro 2021-03-02 16:44:34 PST
Comment on attachment 422019 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=422019&action=review

Thanks Yusuke!

> Source/JavaScriptCore/jsc.cpp:416
> +    CommandLine(CommandLineForWorkersTag);

This could use: explicit
Comment 13 Yusuke Suzuki 2021-03-02 17:56:38 PST
Comment on attachment 422019 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=422019&action=review

Thanks!

>> Source/JavaScriptCore/jsc.cpp:416
>> +    CommandLine(CommandLineForWorkersTag);
> 
> This could use: explicit

Fixed.
Comment 14 Yusuke Suzuki 2021-03-02 18:04:22 PST
Committed r273779 (234776@main): <https://commits.webkit.org/234776@main>