Bug 171615 - webkitpy: Restart driver between test shards
Summary: webkitpy: Restart driver between test shards
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: Other
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Jonathan Bedard
URL:
Keywords: InRadar
Depends on:
Blocks: 228429
  Show dependency treegraph
 
Reported: 2017-05-03 13:16 PDT by Jonathan Bedard
Modified: 2021-09-16 23:20 PDT (History)
11 users (show)

See Also:


Attachments
Patch (2.58 KB, patch)
2017-05-03 14:01 PDT, Jonathan Bedard
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ews101 for mac-elcapitan (780.57 KB, application/zip)
2017-05-03 15:07 PDT, Build Bot
no flags Details
Archive of layout-test-results from ews107 for mac-elcapitan-wk2 (815.87 KB, application/zip)
2017-05-03 15:13 PDT, Build Bot
no flags Details
Archive of layout-test-results from ews112 for mac-elcapitan (1.53 MB, application/zip)
2017-05-03 15:19 PDT, Build Bot
no flags Details
Archive of layout-test-results from ews121 for ios-simulator-wk2 (14.16 MB, application/zip)
2017-05-03 15:39 PDT, Build Bot
no flags Details
Patch (2.59 KB, patch)
2021-08-05 15:35 PDT, Jonathan Bedard
no flags Details | Formatted Diff | Diff
Patch (2.78 KB, patch)
2021-08-09 08:52 PDT, Jonathan Bedard
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jonathan Bedard 2017-05-03 13:16:49 PDT
One the cause of flakey tests is that some state can be retained between tests.  To mitigate this, we could restart the driver between test shards.
Comment 1 Alexey Proskuryakov 2017-05-03 13:20:25 PDT
What state can persist in the driver? Do we have evidence that this happens?
Comment 2 Jonathan Bedard 2017-05-03 13:25:40 PDT
(In reply to Alexey Proskuryakov from comment #1)
> What state can persist in the driver? Do we have evidence that this happens?

There is evidence that this has happened, although, I couldn't give an example of it happening right now.

Talking to Dean and Ryan, this is a possible technique to decrease the number of flakey tests.
Comment 3 Jonathan Bedard 2017-05-03 14:01:11 PDT
Created attachment 308947 [details]
Patch
Comment 4 Build Bot 2017-05-03 15:07:54 PDT
Comment on attachment 308947 [details]
Patch

Attachment 308947 [details] did not pass mac-ews (mac):
Output: http://webkit-queues.webkit.org/results/3666850

New failing tests:
imported/w3c/web-platform-tests/html/browsers/windows/browsing-context.html
Comment 5 Build Bot 2017-05-03 15:07:55 PDT
Created attachment 308962 [details]
Archive of layout-test-results from ews101 for mac-elcapitan

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews101  Port: mac-elcapitan  Platform: Mac OS X 10.11.6
Comment 6 Build Bot 2017-05-03 15:13:10 PDT
Comment on attachment 308947 [details]
Patch

Attachment 308947 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.webkit.org/results/3666856

New failing tests:
imported/w3c/web-platform-tests/html/browsers/windows/browsing-context.html
Comment 7 Build Bot 2017-05-03 15:13:11 PDT
Created attachment 308964 [details]
Archive of layout-test-results from ews107 for mac-elcapitan-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews107  Port: mac-elcapitan-wk2  Platform: Mac OS X 10.11.6
Comment 8 Build Bot 2017-05-03 15:19:09 PDT
Comment on attachment 308947 [details]
Patch

Attachment 308947 [details] did not pass mac-debug-ews (mac):
Output: http://webkit-queues.webkit.org/results/3666854

New failing tests:
imported/w3c/web-platform-tests/html/browsers/windows/browsing-context.html
Comment 9 Build Bot 2017-05-03 15:19:10 PDT
Created attachment 308965 [details]
Archive of layout-test-results from ews112 for mac-elcapitan

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews112  Port: mac-elcapitan  Platform: Mac OS X 10.11.6
Comment 10 Ryan Haddad 2017-05-03 15:24:34 PDT
(In reply to Build Bot from comment #8)
> Comment on attachment 308947 [details]
> Patch
> 
> Attachment 308947 [details] did not pass mac-debug-ews (mac):
> Output: http://webkit-queues.webkit.org/results/3666854
> 
> New failing tests:
> imported/w3c/web-platform-tests/html/browsers/windows/browsing-context.html

--- /Volumes/Data/EWS/WebKit/WebKitBuild/Release/layout-test-results/imported/w3c/web-platform-tests/html/browsers/windows/browsing-context-expected.txt
+++ /Volumes/Data/EWS/WebKit/WebKitBuild/Release/layout-test-results/imported/w3c/web-platform-tests/html/browsers/windows/browsing-context-actual.txt
@@ -2,5 +2,5 @@
 FAIL Check that browsing context has new, ready HTML document assert_equals: The document's encoding should be 'UTF-8'. expected "UTF-8" but got "windows-1252"
 PASS Check that new document nodes extant, empty 
 PASS Check the document properties corresponding to the creator browsing context 
-FAIL Check the history.length of the created browsing context assert_equals: The history.length should be 1. expected 1 but got 100
+FAIL Check the history.length of the created browsing context assert_equals: The history.length should be 1. expected 1 but got 4
 
I guess this is a progression?
Comment 11 Build Bot 2017-05-03 15:39:58 PDT
Comment on attachment 308947 [details]
Patch

Attachment 308947 [details] did not pass ios-sim-ews (ios-simulator-wk2):
Output: http://webkit-queues.webkit.org/results/3666897

New failing tests:
imported/w3c/web-platform-tests/html/browsers/windows/browsing-context.html
Comment 12 Build Bot 2017-05-03 15:39:59 PDT
Created attachment 308975 [details]
Archive of layout-test-results from ews121 for ios-simulator-wk2

The attached test failures were seen while running run-webkit-tests on the ios-sim-ews.
Bot: ews121  Port: ios-simulator-wk2  Platform: Mac OS X 10.11.6
Comment 13 Alexey Proskuryakov 2017-05-03 16:54:20 PDT
I would be against making this change without fully understanding why it's needed.
Comment 14 Jonathan Bedard 2017-05-04 16:05:14 PDT
(In reply to Alexey Proskuryakov from comment #13)
> I would be against making this change without fully understanding why it's
> needed.

What I find baffling here is EWS indicates that new tests are failing.  This would mean that we have tests which are relying on state set in earlier tests.  I'd like to see if I can reproduce any of these failures locally.
Comment 15 Ryosuke Niwa 2017-05-08 20:58:29 PDT
Comment on attachment 308947 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=308947&action=review

> Tools/ChangeLog:3
> +        webkitpy: Restart driver between test shards

This will slow down tests.

> Tools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:265
> +        optparse.make_option("--persistent-driver", action="store_true",
> +            default=False, help="Attempt to keep the driver running as long as possible. Without this option, the driver will restart between shards."),

Given the other option to restart the driver between each test is called --run-singly, perhaps we should call this --run-persistently.
It most certainly shouldn't be a noun.
Comment 16 Jonathan Bedard 2017-05-09 08:11:00 PDT
(In reply to Ryosuke Niwa from comment #15)
> Comment on attachment 308947 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=308947&action=review
> 
> > Tools/ChangeLog:3
> > +        webkitpy: Restart driver between test shards
> 
> This will slow down tests.
> 

Yes, it will.  But, the results so far indicate that state is being retained between tests and effecting the results.  I need to investigate precisely why this is, but I think our test suite being slower and deterministic is better than it being fast but flakey.
Comment 17 Ryosuke Niwa 2017-05-09 14:19:05 PDT
(In reply to Jonathan Bedard from comment #16)
> (In reply to Ryosuke Niwa from comment #15)
> > Comment on attachment 308947 [details]
> > Patch
> > 
> > View in context:
> > https://bugs.webkit.org/attachment.cgi?id=308947&action=review
> > 
> > > Tools/ChangeLog:3
> > > +        webkitpy: Restart driver between test shards
> > 
> > This will slow down tests.
> > 
> 
> Yes, it will.  But, the results so far indicate that state is being retained
> between tests and effecting the results.  I need to investigate precisely
> why this is, but I think our test suite being slower and deterministic is
> better than it being fast but flakey.

I'm not so sure about that. If that were the case, we would be restarting DRT/WTR between each test but we don't. Also, people have argued for using the same instance to catch hard-to-reproduce bugs that only manifest when the process had been running for a while such as leaks that accumulate over time, or states not being cleared between page  loads.
Comment 18 Ryosuke Niwa 2017-05-09 14:19:43 PDT
The change of this magnitude should probably be discussed on webkit-dev first and get a consensus.
Comment 19 Jonathan Bedard 2017-05-09 14:45:52 PDT
(In reply to Ryosuke Niwa from comment #17)
> ...
> 
> I'm not so sure about that. If that were the case, we would be restarting
> DRT/WTR between each test but we don't. Also, people have argued for using
> the same instance to catch hard-to-reproduce bugs that only manifest when
> the process had been running for a while such as leaks that accumulate over
> time, or states not being cleared between page  loads.

The central problem with the way we currently do things is that it's not deterministic.

I agree that we shouldn't restart DRT/WTR between each test.  However, the way we currently run things, any given test run will change how the tests are sharded.  Meaning that hard-to-reproduce failures will get reproduced, but without a solid sequence of steps to reproduce them.  This is why the change in this bug only restarts the test runners between shards.  This means that we get the best of both worlds.  A (mostly) persistent test runner that will retain state within a test shard, but, will also provide predictable and repeatable results.

I don't think that test failure we cannot reproduce are helpful.  They don't get acted on.
Comment 20 Jonathan Bedard 2017-05-09 14:46:27 PDT
(In reply to Ryosuke Niwa from comment #18)
> The change of this magnitude should probably be discussed on webkit-dev
> first and get a consensus.

I agree, this bug was because Dean wanted to see what this change would look like.  I have no intention of committing this until there is more of a consensus and the issue is better understood.
Comment 21 Ryosuke Niwa 2017-05-09 14:48:44 PDT
(In reply to Jonathan Bedard from comment #19)
> (In reply to Ryosuke Niwa from comment #17)
> > ...
> > 
> > I'm not so sure about that. If that were the case, we would be restarting
> > DRT/WTR between each test but we don't. Also, people have argued for using
> > the same instance to catch hard-to-reproduce bugs that only manifest when
> > the process had been running for a while such as leaks that accumulate over
> > time, or states not being cleared between page  loads.
> 
> The central problem with the way we currently do things is that it's not
> deterministic.

Well, we've always had that issue, and this issue had been discussed extensively in the past. If you're changing the status quo, you must get buy-in from other contributors who are making WebKit code changes.

> I agree that we shouldn't restart DRT/WTR between each test.  However, the
> way we currently run things, any given test run will change how the tests
> are sharded.  Meaning that hard-to-reproduce failures will get reproduced,
> but without a solid sequence of steps to reproduce them.  This is why the
> change in this bug only restarts the test runners between shards.  This
> means that we get the best of both worlds.  A (mostly) persistent test
> runner that will retain state within a test shard, but, will also provide
> predictable and repeatable results.

Well, that's one solution to the problem of inconsistent states, and people have varying opinions on this matter. As such, I don't think we should be making changes like this without getting a buy-in from other engineers.

> I don't think that test failure we cannot reproduce are helpful.  They don't
> get acted on.

Except they do. Alexey fixed a number of bugs like that, and some of them turned out to be serious IPC bugs, and states in WebView / WKView not being cleared across page loads.
Comment 22 Jonathan Bedard 2021-08-05 15:35:51 PDT
Created attachment 435032 [details]
Patch
Comment 23 Jonathan Bedard 2021-08-05 15:43:52 PDT
Reviving this bug.

We have some data from https://bugs.webkit.org/show_bug.cgi?id=228429 that indicates that we have a handful of tests that fail when we change the order of shards. We need to change the order shards run in, to speed up run-webkit-tests, but we can't do that as long as doing so causes flakey failures. This should mean that the combination of this change and https://bugs.webkit.org/show_bug.cgi?id=228429 will be a performance improvement.
Comment 24 Jonathan Bedard 2021-08-09 08:52:57 PDT
Created attachment 435186 [details]
Patch
Comment 25 Radar WebKit Bug Importer 2021-09-16 23:20:05 PDT
<rdar://problem/83227925>