171615 – webkitpy: Restart driver between test shards

NEW 171615

webkitpy: Restart driver between test shards

https://bugs.webkit.org/show_bug.cgi?id=171615

Summary webkitpy: Restart driver between test shards

Jonathan Bedard

Reported 2017-05-03 13:16:49 PDT

One the cause of flakey tests is that some state can be retained between tests. To mitigate this, we could restart the driver between test shards.

Attachments
Patch (2.58 KB, patch) 2017-05-03 14:01 PDT, Jonathan Bedard	no flags	Details Formatted Diff Diff
Archive of layout-test-results from ews101 for mac-elcapitan (780.57 KB, application/zip) 2017-05-03 15:07 PDT, Build Bot	no flags	Details
Archive of layout-test-results from ews107 for mac-elcapitan-wk2 (815.87 KB, application/zip) 2017-05-03 15:13 PDT, Build Bot	no flags	Details
Archive of layout-test-results from ews112 for mac-elcapitan (1.53 MB, application/zip) 2017-05-03 15:19 PDT, Build Bot	no flags	Details
Archive of layout-test-results from ews121 for ios-simulator-wk2 (14.16 MB, application/zip) 2017-05-03 15:39 PDT, Build Bot	no flags	Details
Patch (2.59 KB, patch) 2021-08-05 15:35 PDT, Jonathan Bedard	no flags	Details Formatted Diff Diff
Patch (2.78 KB, patch) 2021-08-09 08:52 PDT, Jonathan Bedard	no flags	Details Formatted Diff Diff
Show Obsolete (6) View All Add attachment proposed patch, testcase, etc.

Alexey Proskuryakov

Comment 1 2017-05-03 13:20:25 PDT

What state can persist in the driver? Do we have evidence that this happens?

Jonathan Bedard

Comment 2 2017-05-03 13:25:40 PDT

(In reply to Alexey Proskuryakov from comment #1) > What state can persist in the driver? Do we have evidence that this happens? There is evidence that this has happened, although, I couldn't give an example of it happening right now. Talking to Dean and Ryan, this is a possible technique to decrease the number of flakey tests.

Jonathan Bedard

Comment 3 2017-05-03 14:01:11 PDT

Created attachment 308947 [details] Patch

Build Bot

Comment 4 2017-05-03 15:07:54 PDT

Comment on attachment 308947 [details] Patch Attachment 308947 [details] did not pass mac-ews (mac): Output: http://webkit-queues.webkit.org/results/3666850 New failing tests: imported/w3c/web-platform-tests/html/browsers/windows/browsing-context.html

Build Bot

Comment 5 2017-05-03 15:07:55 PDT

Created attachment 308962 [details] Archive of layout-test-results from ews101 for mac-elcapitan The attached test failures were seen while running run-webkit-tests on the mac-ews. Bot: ews101 Port: mac-elcapitan Platform: Mac OS X 10.11.6

Build Bot

Comment 6 2017-05-03 15:13:10 PDT

Comment on attachment 308947 [details] Patch Attachment 308947 [details] did not pass mac-wk2-ews (mac-wk2): Output: http://webkit-queues.webkit.org/results/3666856 New failing tests: imported/w3c/web-platform-tests/html/browsers/windows/browsing-context.html

Build Bot

Comment 7 2017-05-03 15:13:11 PDT

Created attachment 308964 [details] Archive of layout-test-results from ews107 for mac-elcapitan-wk2 The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews. Bot: ews107 Port: mac-elcapitan-wk2 Platform: Mac OS X 10.11.6

Build Bot

Comment 8 2017-05-03 15:19:09 PDT

Comment on attachment 308947 [details] Patch Attachment 308947 [details] did not pass mac-debug-ews (mac): Output: http://webkit-queues.webkit.org/results/3666854 New failing tests: imported/w3c/web-platform-tests/html/browsers/windows/browsing-context.html

Build Bot

Comment 9 2017-05-03 15:19:10 PDT

Created attachment 308965 [details] Archive of layout-test-results from ews112 for mac-elcapitan The attached test failures were seen while running run-webkit-tests on the mac-debug-ews. Bot: ews112 Port: mac-elcapitan Platform: Mac OS X 10.11.6

Ryan Haddad

Comment 10 2017-05-03 15:24:34 PDT

(In reply to Build Bot from comment #8) > Comment on attachment 308947 [details] > Patch > > Attachment 308947 [details] did not pass mac-debug-ews (mac): > Output: http://webkit-queues.webkit.org/results/3666854 > > New failing tests: > imported/w3c/web-platform-tests/html/browsers/windows/browsing-context.html --- /Volumes/Data/EWS/WebKit/WebKitBuild/Release/layout-test-results/imported/w3c/web-platform-tests/html/browsers/windows/browsing-context-expected.txt +++ /Volumes/Data/EWS/WebKit/WebKitBuild/Release/layout-test-results/imported/w3c/web-platform-tests/html/browsers/windows/browsing-context-actual.txt @@ -2,5 +2,5 @@ FAIL Check that browsing context has new, ready HTML document assert_equals: The document's encoding should be 'UTF-8'. expected "UTF-8" but got "windows-1252" PASS Check that new document nodes extant, empty PASS Check the document properties corresponding to the creator browsing context -FAIL Check the history.length of the created browsing context assert_equals: The history.length should be 1. expected 1 but got 100 +FAIL Check the history.length of the created browsing context assert_equals: The history.length should be 1. expected 1 but got 4 I guess this is a progression?

Build Bot

Comment 11 2017-05-03 15:39:58 PDT

Comment on attachment 308947 [details] Patch Attachment 308947 [details] did not pass ios-sim-ews (ios-simulator-wk2): Output: http://webkit-queues.webkit.org/results/3666897 New failing tests: imported/w3c/web-platform-tests/html/browsers/windows/browsing-context.html

Build Bot

Comment 12 2017-05-03 15:39:59 PDT

Created attachment 308975 [details] Archive of layout-test-results from ews121 for ios-simulator-wk2 The attached test failures were seen while running run-webkit-tests on the ios-sim-ews. Bot: ews121 Port: ios-simulator-wk2 Platform: Mac OS X 10.11.6

Alexey Proskuryakov

Comment 13 2017-05-03 16:54:20 PDT

I would be against making this change without fully understanding why it's needed.

Jonathan Bedard

Comment 14 2017-05-04 16:05:14 PDT

(In reply to Alexey Proskuryakov from comment #13) > I would be against making this change without fully understanding why it's > needed. What I find baffling here is EWS indicates that new tests are failing. This would mean that we have tests which are relying on state set in earlier tests. I'd like to see if I can reproduce any of these failures locally.

Ryosuke Niwa

Comment 15 2017-05-08 20:58:29 PDT

Comment on attachment 308947 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=308947&action=review > Tools/ChangeLog:3 > + webkitpy: Restart driver between test shards This will slow down tests. > Tools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:265 > + optparse.make_option("--persistent-driver", action="store_true", > + default=False, help="Attempt to keep the driver running as long as possible. Without this option, the driver will restart between shards."), Given the other option to restart the driver between each test is called --run-singly, perhaps we should call this --run-persistently. It most certainly shouldn't be a noun.

Jonathan Bedard

Comment 16 2017-05-09 08:11:00 PDT

(In reply to Ryosuke Niwa from comment #15) > Comment on attachment 308947 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=308947&action=review > > > Tools/ChangeLog:3 > > + webkitpy: Restart driver between test shards > > This will slow down tests. > Yes, it will. But, the results so far indicate that state is being retained between tests and effecting the results. I need to investigate precisely why this is, but I think our test suite being slower and deterministic is better than it being fast but flakey.

Ryosuke Niwa

Comment 17 2017-05-09 14:19:05 PDT

(In reply to Jonathan Bedard from comment #16) > (In reply to Ryosuke Niwa from comment #15) > > Comment on attachment 308947 [details] > > Patch > > > > View in context: > > https://bugs.webkit.org/attachment.cgi?id=308947&action=review > > > > > Tools/ChangeLog:3 > > > + webkitpy: Restart driver between test shards > > > > This will slow down tests. > > > > Yes, it will. But, the results so far indicate that state is being retained > between tests and effecting the results. I need to investigate precisely > why this is, but I think our test suite being slower and deterministic is > better than it being fast but flakey. I'm not so sure about that. If that were the case, we would be restarting DRT/WTR between each test but we don't. Also, people have argued for using the same instance to catch hard-to-reproduce bugs that only manifest when the process had been running for a while such as leaks that accumulate over time, or states not being cleared between page loads.

Ryosuke Niwa

Comment 18 2017-05-09 14:19:43 PDT

The change of this magnitude should probably be discussed on webkit-dev first and get a consensus.

Jonathan Bedard

Comment 19 2017-05-09 14:45:52 PDT

(In reply to Ryosuke Niwa from comment #17) > ... > > I'm not so sure about that. If that were the case, we would be restarting > DRT/WTR between each test but we don't. Also, people have argued for using > the same instance to catch hard-to-reproduce bugs that only manifest when > the process had been running for a while such as leaks that accumulate over > time, or states not being cleared between page loads. The central problem with the way we currently do things is that it's not deterministic. I agree that we shouldn't restart DRT/WTR between each test. However, the way we currently run things, any given test run will change how the tests are sharded. Meaning that hard-to-reproduce failures will get reproduced, but without a solid sequence of steps to reproduce them. This is why the change in this bug only restarts the test runners between shards. This means that we get the best of both worlds. A (mostly) persistent test runner that will retain state within a test shard, but, will also provide predictable and repeatable results. I don't think that test failure we cannot reproduce are helpful. They don't get acted on.

Jonathan Bedard

Comment 20 2017-05-09 14:46:27 PDT

(In reply to Ryosuke Niwa from comment #18) > The change of this magnitude should probably be discussed on webkit-dev > first and get a consensus. I agree, this bug was because Dean wanted to see what this change would look like. I have no intention of committing this until there is more of a consensus and the issue is better understood.

Ryosuke Niwa

Comment 21 2017-05-09 14:48:44 PDT

(In reply to Jonathan Bedard from comment #19) > (In reply to Ryosuke Niwa from comment #17) > > ... > > > > I'm not so sure about that. If that were the case, we would be restarting > > DRT/WTR between each test but we don't. Also, people have argued for using > > the same instance to catch hard-to-reproduce bugs that only manifest when > > the process had been running for a while such as leaks that accumulate over > > time, or states not being cleared between page loads. > > The central problem with the way we currently do things is that it's not > deterministic. Well, we've always had that issue, and this issue had been discussed extensively in the past. If you're changing the status quo, you must get buy-in from other contributors who are making WebKit code changes. > I agree that we shouldn't restart DRT/WTR between each test. However, the > way we currently run things, any given test run will change how the tests > are sharded. Meaning that hard-to-reproduce failures will get reproduced, > but without a solid sequence of steps to reproduce them. This is why the > change in this bug only restarts the test runners between shards. This > means that we get the best of both worlds. A (mostly) persistent test > runner that will retain state within a test shard, but, will also provide > predictable and repeatable results. Well, that's one solution to the problem of inconsistent states, and people have varying opinions on this matter. As such, I don't think we should be making changes like this without getting a buy-in from other engineers. > I don't think that test failure we cannot reproduce are helpful. They don't > get acted on. Except they do. Alexey fixed a number of bugs like that, and some of them turned out to be serious IPC bugs, and states in WebView / WKView not being cleared across page loads.

Jonathan Bedard

Comment 22 2021-08-05 15:35:51 PDT

Created attachment 435032 [details] Patch

Jonathan Bedard

Comment 23 2021-08-05 15:43:52 PDT

Reviving this bug. We have some data from https://bugs.webkit.org/show_bug.cgi?id=228429 that indicates that we have a handful of tests that fail when we change the order of shards. We need to change the order shards run in, to speed up run-webkit-tests, but we can't do that as long as doing so causes flakey failures. This should mean that the combination of this change and https://bugs.webkit.org/show_bug.cgi?id=228429 will be a performance improvement.

Jonathan Bedard

Comment 24 2021-08-09 08:52:57 PDT

Created attachment 435186 [details] Patch

Radar WebKit Bug Importer

Comment 25 2021-09-16 23:20:05 PDT

<rdar://problem/83227925>

Note You need to log in before you can comment on or make changes to this bug.

Status NEW

Resolution

Priority P2

Severity Normal

Classification Unclassified

Version Other

Hardware Unspecified

OS Unspecified

Product WebKit

Component Tools / Tests

Assignee

Jonathan Bedard

Reported

2017-05-03 13:16 PDT

Modified

2021-09-16 23:20 PDT History

CC List

11 users Show

URL

Keywords InRadar

Depends on

Blocks

228429

Dependencies

tree graph