Bug 170149 - EWS should kill processes before attempting to run tests
Summary: EWS should kill processes before attempting to run tests
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Jonathan Bedard
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2017-03-27 17:37 PDT by Jonathan Bedard
Modified: 2018-06-19 14:03 PDT (History)
7 users (show)

See Also:


Attachments
Patch (10.58 KB, patch)
2017-03-27 17:43 PDT, Jonathan Bedard
clopez: review+
buildbot: commit-queue-
Details | Formatted Diff | Diff
Archive of layout-test-results from ews124 for ios-simulator-wk2 (1.12 MB, application/zip)
2017-03-27 23:15 PDT, Build Bot
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jonathan Bedard 2017-03-27 17:37:25 PDT
Recently, we saw an issue where Simulators remained booted between EWS test runs, greatly slowing them down.  To prevent this, all relevant processes should be killed before running any tests.
Comment 1 Radar WebKit Bug Importer 2017-03-27 17:38:48 PDT
<rdar://problem/31287788>
Comment 2 Jonathan Bedard 2017-03-27 17:43:39 PDT
Created attachment 305530 [details]
Patch
Comment 3 Build Bot 2017-03-27 23:15:53 PDT
Comment on attachment 305530 [details]
Patch

Attachment 305530 [details] did not pass ios-sim-ews (ios-simulator-wk2):
Output: http://webkit-queues.webkit.org/results/3423708

New failing tests:
http/tests/incremental/stylesheet-body-incremental-rendering.html
Comment 4 Build Bot 2017-03-27 23:15:56 PDT
Created attachment 305566 [details]
Archive of layout-test-results from ews124 for ios-simulator-wk2

The attached test failures were seen while running run-webkit-tests on the ios-sim-ews.
Bot: ews124  Port: ios-simulator-wk2  Platform: Mac OS X 10.11.6
Comment 5 Alexey Proskuryakov 2017-03-28 17:16:16 PDT
This needs to be considered in the context of the issue that we have on Mac EWS, where DumpRenderTree processes get stuck.

This patch would kill them, but it will not resolve the root cause that makes CG calls freeze, so it may make the behavior worse overall. Not sure how to determine that.
Comment 6 Carlos Alberto Lopez Perez 2017-04-03 18:13:22 PDT
(In reply to Alexey Proskuryakov from comment #5)
> This needs to be considered in the context of the issue that we have on Mac
> EWS, where DumpRenderTree processes get stuck.
> 
> This patch would kill them, but it will not resolve the root cause that
> makes CG calls freeze, so it may make the behavior worse overall. Not sure
> how to determine that.

This mimics the behaviour of the bots. The script kill-old-processes is executed by the bots on each new run.

I think that having the behaviour on the bots and on the EWS is something good.
Comment 7 Carlos Alberto Lopez Perez 2017-04-03 18:13:54 PDT
(In reply to Carlos Alberto Lopez Perez from comment #6)
> I think that having the behaviour on the bots and on the EWS is something
> good.

 ^^^^ I think that having the (same) behaviour on the bots and on the EWS is something good.
Comment 8 Alexey Proskuryakov 2017-04-03 18:24:33 PDT
I would agree in theory, but in practice, we don't get the same behavior with stuck CG calls on regular bots.
Comment 9 Carlos Alberto Lopez Perez 2018-06-19 08:05:43 PDT
(In reply to Alexey Proskuryakov from comment #8)
> I would agree in theory, but in practice, we don't get the same behavior
> with stuck CG calls on regular bots.

Not sure if this is an still an issue for you, but giving the timing of the issue I guess its not.

In any I think this patch will be useful anyway.

Since we enabled tests on the GTK EWS in bug 186559 I'm seeing several zombie process on our EWS bot after each run that should be killed to allow it continue without major issues.

I guess the same happens on our bots, but since there we rely on the kill-old-process step that runs always after each run, that is not an issue on the bots.


I can deploy some workaround locally to kill or restart the EWS more often to avoid this, but having support in the tooling itself will be much better.
Comment 10 Carlos Alberto Lopez Perez 2018-06-19 08:07:02 PDT
Comment on attachment 305530 [details]
Patch

This makes the EWS work more similar to the bots (those always kill old process before starting a new run), so I think this is an overall improvement to make tests on the EWS more reproducible.

I will like to see this landed.
Comment 11 Jonathan Bedard 2018-06-19 14:03:24 PDT
Just a few notes on this:

We aren't seeing these sorts of issues on Mac or iOS Simulator EWS any more. If memory serves, we resolved them by adding exit handlers to webkitpy Simulator management and investigating the root-cause of the zombie processes.

On that note, we had at least one instance in the last year where the zombie processes on EWS were a symptom of a bug in WebKit's multi-process code which was orphaning processes. It's probably worth root-causing why those processes are orphaned.

I believe that in case of EWS machines managed by Apple, we reboot them every 5-10 test runs, Aakash or Lucas would likely know the exact configuration.  I wonder if this might be another solution to the problems the GTK bots are experiencing.

That being said, I'm not opposed to this patch.