Bug 222601 - [JSC] detect infrastructure failure for remote stress tests
Summary: [JSC] detect infrastructure failure for remote stress tests
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Angelos Oikonomopoulos
URL:
Keywords: InRadar
Depends on: 224423
Blocks:
  Show dependency treegraph
 
Reported: 2021-03-02 08:32 PST by Angelos Oikonomopoulos
Modified: 2021-05-11 01:34 PDT (History)
13 users (show)

See Also:


Attachments
Patch (18.40 KB, patch)
2021-03-02 08:35 PST, Angelos Oikonomopoulos
no flags Details | Formatted Diff | Diff
Patch (18.36 KB, patch)
2021-03-02 09:12 PST, Angelos Oikonomopoulos
no flags Details | Formatted Diff | Diff
Patch (20.28 KB, patch)
2021-03-03 06:12 PST, Angelos Oikonomopoulos
no flags Details | Formatted Diff | Diff
Patch (20.29 KB, patch)
2021-03-03 06:44 PST, Angelos Oikonomopoulos
no flags Details | Formatted Diff | Diff
Patch (26.66 KB, patch)
2021-04-29 06:27 PDT, Angelos Oikonomopoulos
no flags Details | Formatted Diff | Diff
Patch (26.70 KB, patch)
2021-05-05 01:20 PDT, Angelos Oikonomopoulos
no flags Details | Formatted Diff | Diff
Patch (26.68 KB, patch)
2021-05-06 04:56 PDT, Angelos Oikonomopoulos
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Angelos Oikonomopoulos 2021-03-02 08:32:11 PST
[JSC] detect infrastructure failure for remote stress tests
Comment 1 Angelos Oikonomopoulos 2021-03-02 08:35:58 PST
Created attachment 421943 [details]
Patch
Comment 2 Angelos Oikonomopoulos 2021-03-02 08:38:29 PST
The current patch only changes jsc-stress-test-writer-default.rb. Uploading this for early comments, will update the other 2 writers if things look good.
Comment 3 Angelos Oikonomopoulos 2021-03-02 09:12:34 PST
Created attachment 421947 [details]
Patch
Comment 4 Angelos Oikonomopoulos 2021-03-03 06:12:09 PST
Created attachment 422072 [details]
Patch
Comment 5 Angelos Oikonomopoulos 2021-03-03 06:44:57 PST
Created attachment 422073 [details]
Patch
Comment 6 Radar WebKit Bug Importer 2021-03-09 08:33:14 PST
<rdar://problem/75218695>
Comment 7 Angelos Oikonomopoulos 2021-03-10 07:36:16 PST
Any objections in principle?
Comment 8 Yusuke Suzuki 2021-04-10 13:01:17 PDT
Comment on attachment 422073 [details]
Patch

r=me
Comment 9 EWS 2021-04-10 13:14:28 PDT
Committed r275801 (236372@main): <https://commits.webkit.org/236372@main>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 422073 [details].
Comment 10 WebKit Commit Bot 2021-04-11 22:34:11 PDT
Re-opened since this is blocked by bug 224423
Comment 11 Yusuke Suzuki 2021-04-11 22:35:00 PDT
@Mark found that it seems that run-jsc-stress-tests output includes a lot of texts now. So I've reverted it.

    0/75193 .............................Could not open file: /Users/cyril/mozilla_funfuzz/wtmp16/regression-tests.list
Could not open file: /Users/cyril/mozilla_funfuzz/wtmp16/regression-tests.list
Could not open file: /Users/cyril/mozilla_funfuzz/wtmp16/regression-tests.list
Could not open file: /Users/cyril/mozilla_funfuzz/wtmp16/regression-tests.list
...
Could not open file: /Users/cyril/mozilla_funfuzz/wtmp16/regression-tests.list
    2/75193 ............................Could not open file: /Users/cyril/mozilla_funfuzz/wtmp16/regression-tests.list
Could not open file: /Users/cyril/mozilla_funfuzz/wtmp16/regression-tests.list
    3/75193 .............................Could not open file: /Users/cyril/mozilla_funfuzz/wtmp16/regression-tests.list
Could not open file: /Users/cyril/mozilla_funfuzz/wtmp16/regression-tests.list
...
Comment 12 Yusuke Suzuki 2021-04-11 22:35:16 PDT
@Angelos can you take a look?
Comment 13 Angelos Oikonomopoulos 2021-04-11 23:42:16 PDT
(In reply to Yusuke Suzuki from comment #12)
> @Angelos can you take a look?

Absolutely. Note that, as stated in an earlier comment, I'll need to update the other two test writers before this can go in. Sorry that wasn't clear!

I'll submit a new version when I figure out the issues that cropped up. Thanks for taking a look!
Comment 14 Angelos Oikonomopoulos 2021-04-13 07:31:27 PDT
(In reply to Yusuke Suzuki from comment #11)
> @Mark found that it seems that run-jsc-stress-tests output includes a lot of
> texts now. So I've reverted it.
> 
>     0/75193 .............................Could not open file:
> /Users/cyril/mozilla_funfuzz/wtmp16/regression-tests.list
> Could not open file:
> /Users/cyril/mozilla_funfuzz/wtmp16/regression-tests.list
> Could not open file:
> /Users/cyril/mozilla_funfuzz/wtmp16/regression-tests.list
> Could not open file:
> /Users/cyril/mozilla_funfuzz/wtmp16/regression-tests.list
> ...
> Could not open file:
> /Users/cyril/mozilla_funfuzz/wtmp16/regression-tests.list
>     2/75193 ............................Could not open file:
> /Users/cyril/mozilla_funfuzz/wtmp16/regression-tests.list
> Could not open file:
> /Users/cyril/mozilla_funfuzz/wtmp16/regression-tests.list
>     3/75193 .............................Could not open file:
> /Users/cyril/mozilla_funfuzz/wtmp16/regression-tests.list
> Could not open file:
> /Users/cyril/mozilla_funfuzz/wtmp16/regression-tests.list
> ...

The above to be messages from some non-public usage of funfuzz? I'm unclear as to how that uses run-jsc-stress-tests or where the regression-tests.list file comes from and what run-jsc-stress-tests has got to do with it.

What additional output does run-jsc-stress-tests produce with this patch?

The above messages don't come from any script generated by it AFAIK. (Does @Mark refer to Mark Lam? If so, please let me know and I can ping them directly.)
Comment 15 Mark Lam 2021-04-13 08:07:06 PDT
(In reply to Angelos Oikonomopoulos from comment #14)
> (In reply to Yusuke Suzuki from comment #11)
> > @Mark found that it seems that run-jsc-stress-tests output includes a lot of
> > texts now. So I've reverted it.
> > 
> >     0/75193 .............................Could not open file:
> > /Users/cyril/mozilla_funfuzz/wtmp16/regression-tests.list
> > Could not open file:
> > /Users/cyril/mozilla_funfuzz/wtmp16/regression-tests.list
> > Could not open file:
> > /Users/cyril/mozilla_funfuzz/wtmp16/regression-tests.list
> > Could not open file:
> > /Users/cyril/mozilla_funfuzz/wtmp16/regression-tests.list
> > ...
> > Could not open file:
> > /Users/cyril/mozilla_funfuzz/wtmp16/regression-tests.list
> >     2/75193 ............................Could not open file:
> > /Users/cyril/mozilla_funfuzz/wtmp16/regression-tests.list
> > Could not open file:
> > /Users/cyril/mozilla_funfuzz/wtmp16/regression-tests.list
> >     3/75193 .............................Could not open file:
> > /Users/cyril/mozilla_funfuzz/wtmp16/regression-tests.list
> > Could not open file:
> > /Users/cyril/mozilla_funfuzz/wtmp16/regression-tests.list
> > ...
> 
> The above to be messages from some non-public usage of funfuzz? I'm unclear
> as to how that uses run-jsc-stress-tests or where the regression-tests.list
> file comes from and what run-jsc-stress-tests has got to do with it.
> 
> What additional output does run-jsc-stress-tests produce with this patch?
> 
> The above messages don't come from any script generated by it AFAIK. (Does
> @Mark refer to Mark Lam? If so, please let me know and I can ping them
> directly.)

Yes, Yusuke was referring to me.  These messages came from internal tests.  Before your patch, running the script did not produce this output.  After your patch, it does.  I suspect that the tests were always printing these output.  However, your change somehow expose them.  Do you have any idea where your patch enables more verbose stderr/stdout output to be shown?
Comment 16 Angelos Oikonomopoulos 2021-04-13 08:39:06 PDT
(In reply to Mark Lam from comment #15)
> (In reply to Angelos Oikonomopoulos from comment #14)
> > (In reply to Yusuke Suzuki from comment #11)
[...]
> > The above messages don't come from any script generated by it AFAIK. (Does
> > @Mark refer to Mark Lam? If so, please let me know and I can ping them
> > directly.)
> 
> Yes, Yusuke was referring to me.  These messages came from internal tests. 
> Before your patch, running the script did not produce this output.  After
> your patch, it does.  I suspect that the tests were always printing these
> output.  However, your change somehow expose them.  Do you have any idea
> where your patch enables more verbose stderr/stdout output to be shown?

I've taken a second look and I don't see any instances where the patch changes where stdout/stderr go, but it's tricky code. I can't see any extra output in my testing of course (I just checked again both with local and --remote runs).

webkitruby/jsc-stress-test-writer-default.rb defines a number of handlers that the patch modifies. Are you at liberty to share what test kind is producing the extraneous output? I.e. which run* function is invoked in run-jsc-stress-tests? If so, then I could look at a diff of the generated code in detail.
Comment 17 Angelos Oikonomopoulos 2021-04-29 06:17:51 PDT
FWIW I think I've tracked down the issue with the uncaptured stderr that Mark ran into. I've updated the ruby test writer too, which was straightforward.

My remaining concern is this piece of code:

           outp.puts JSON.generate({
                name: @name,
                outputDir: $runnerDir,
                baseDir: baseDir,
                env: $envVars + @additionalEnv,
                outputName: @name.gsub(/(\\|\/)/, '_'),
                checkScript: filename,
                args: @arguments,
                failFile: "#{failFile}"
            })

in the playstation test writer. My patch removes the failFile and replaces it with a statusFile which contains a single line: $runid $exitcode $pf, where $pf is either P(ass) or F(ail).

Stephan, presumably something checks that failFile on your end? Would you be ok with adapting it to the changes in this patch?
Comment 18 Angelos Oikonomopoulos 2021-04-29 06:27:42 PDT
Created attachment 427342 [details]
Patch
Comment 19 Stephan Szabo 2021-04-29 10:09:29 PDT
(In reply to Angelos Oikonomopoulos from comment #17)
> FWIW I think I've tracked down the issue with the uncaptured stderr that
> Mark ran into. I've updated the ruby test writer too, which was
> straightforward.
> 
> My remaining concern is this piece of code:
> 
>            outp.puts JSON.generate({
>                 name: @name,
>                 outputDir: $runnerDir,
>                 baseDir: baseDir,
>                 env: $envVars + @additionalEnv,
>                 outputName: @name.gsub(/(\\|\/)/, '_'),
>                 checkScript: filename,
>                 args: @arguments,
>                 failFile: "#{failFile}"
>             })
> 
> in the playstation test writer. My patch removes the failFile and replaces
> it with a statusFile which contains a single line: $runid $exitcode $pf,
> where $pf is either P(ass) or F(ail).
> 
> Stephan, presumably something checks that failFile on your end? Would you be
> ok with adapting it to the changes in this patch?

Yes, that'd be fine. Actually, because of the weirdness for filesystem access for our case, I believe our driver is actually are checking the status and then making the fail file for the rest of the test harness, so I should be able to easily have it generate the statusFile in that format instead.
Comment 20 Angelos Oikonomopoulos 2021-05-05 01:20:11 PDT
Created attachment 427739 [details]
Patch
Comment 21 Angelos Oikonomopoulos 2021-05-06 04:56:22 PDT
Created attachment 427872 [details]
Patch
Comment 22 Angelos Oikonomopoulos 2021-05-06 06:45:21 PDT
(In reply to Stephan Szabo from comment #19)
> (In reply to Angelos Oikonomopoulos from comment #17)
[...]
> > Stephan, presumably something checks that failFile on your end? Would you be
> > ok with adapting it to the changes in this patch?
> 
> Yes, that'd be fine. Actually, because of the weirdness for filesystem
> access for our case, I believe our driver is actually are checking the
> status and then making the fail file for the rest of the test harness, so I
> should be able to easily have it generate the statusFile in that format
> instead.

Great! As I can't update your driver, at the moment the patch removes the failFile field from the json file. Do you prefer to handle this some other way?
Comment 23 Stephan Szabo 2021-05-06 09:54:10 PDT
(In reply to Angelos Oikonomopoulos from comment #22)
> (In reply to Stephan Szabo from comment #19)
> > (In reply to Angelos Oikonomopoulos from comment #17)
> [...]
> > > Stephan, presumably something checks that failFile on your end? Would you be
> > > ok with adapting it to the changes in this patch?
> > 
> > Yes, that'd be fine. Actually, because of the weirdness for filesystem
> > access for our case, I believe our driver is actually are checking the
> > status and then making the fail file for the rest of the test harness, so I
> > should be able to easily have it generate the statusFile in that format
> > instead.
> 
> Great! As I can't update your driver, at the moment the patch removes the
> failFile field from the json file. Do you prefer to handle this some other
> way?

I'll probably add statusFile to the json file in the end, but I think the above is fine for the current patch, since I may need to do some other related changes in practice.
Comment 24 Angelos Oikonomopoulos 2021-05-07 05:36:21 PDT
(In reply to Stephan Szabo from comment #23)
[...]
> I'll probably add statusFile to the json file in the end, but I think the
> above is fine for the current patch, since I may need to do some other
> related changes in practice.

Alright, thanks!

@Mark, the change to fix the stderr redirection was

-            cmd = "{ { { { #{shellCommand}; echo $? >&3; } | { #{outputHandler.call(@name)} ;} >&4; } 3>&1; } | { read xs; exit $xs; } } 4>&1\nexitCode=$?\n"
+            cmd = "{ { { { #{shellCommand} 2>&1; echo $? >&3; } | { #{outputHandler.call(@name)} ;} >&4; } 3>&1; } | { read xs; exit $xs; } } 4>&1\nexitCode=$?\n"

Do you feel comfortable trying to land the patch now?
Comment 25 Angelos Oikonomopoulos 2021-05-07 05:38:54 PDT
(In reply to Angelos Oikonomopoulos from comment #24)
[...]
> @Mark, the change to fix the stderr redirection was
> 
> -            cmd = "{ { { { #{shellCommand}; echo $? >&3; } | {
> #{outputHandler.call(@name)} ;} >&4; } 3>&1; } | { read xs; exit $xs; } }
> 4>&1\nexitCode=$?\n"
> +            cmd = "{ { { { #{shellCommand} 2>&1; echo $? >&3; } | {
> #{outputHandler.call(@name)} ;} >&4; } 3>&1; } | { read xs; exit $xs; } }
> 4>&1\nexitCode=$?\n"
> 
> Do you feel comfortable trying to land the patch now?

Note, I also took the opportunity to clarify the above pipeline with an ascii diagram in a preceding comment:

# +--------------------------------------------------------------------+                                                                                                                              
# | +-----------------------------------------------+                  |                                                                                                                              
# | | +--------------+     +-------------------+    |                  |                                                                                                                              
# | | | cmd 1 ----> 1|---> |0 --> outH 1 ---> 4|-> 4|---------------> 1|                                                                                                                              
# | | |     2 /      |     +-------------------+    |   +-----------+  |                                                                                                                              
# | | |echo $? 0 -> 3|---------------------------> 1|-> |0 read xs  |  |                                                                                                                              
# | | +--------------+                              |   |  exit $xs |  |                                                                                                                              
# | |                                               |   +-----------+  |                                                                                                                              
# | +-----------------------------------------------+                  |                                                                                                                              
# +--------------------------------------------------------------------+
Comment 26 Mark Lam 2021-05-10 10:31:07 PDT
Comment on attachment 427872 [details]
Patch

Looks reasonable to me.
Comment 27 EWS 2021-05-11 01:34:15 PDT
Committed r277318 (237577@main): <https://commits.webkit.org/237577@main>

All reviewed patches have been landed. Closing bug and clearing flags on attachment 427872 [details].