NEW 171615
webkitpy: Restart driver between test shards
https://bugs.webkit.org/show_bug.cgi?id=171615
Summary webkitpy: Restart driver between test shards
Jonathan Bedard
Reported 2017-05-03 13:16:49 PDT
One the cause of flakey tests is that some state can be retained between tests. To mitigate this, we could restart the driver between test shards.
Attachments
Patch (2.58 KB, patch)
2017-05-03 14:01 PDT, Jonathan Bedard
no flags
Archive of layout-test-results from ews101 for mac-elcapitan (780.57 KB, application/zip)
2017-05-03 15:07 PDT, Build Bot
no flags
Archive of layout-test-results from ews107 for mac-elcapitan-wk2 (815.87 KB, application/zip)
2017-05-03 15:13 PDT, Build Bot
no flags
Archive of layout-test-results from ews112 for mac-elcapitan (1.53 MB, application/zip)
2017-05-03 15:19 PDT, Build Bot
no flags
Archive of layout-test-results from ews121 for ios-simulator-wk2 (14.16 MB, application/zip)
2017-05-03 15:39 PDT, Build Bot
no flags
Patch (2.59 KB, patch)
2021-08-05 15:35 PDT, Jonathan Bedard
no flags
Patch (2.78 KB, patch)
2021-08-09 08:52 PDT, Jonathan Bedard
no flags
Alexey Proskuryakov
Comment 1 2017-05-03 13:20:25 PDT
What state can persist in the driver? Do we have evidence that this happens?
Jonathan Bedard
Comment 2 2017-05-03 13:25:40 PDT
(In reply to Alexey Proskuryakov from comment #1) > What state can persist in the driver? Do we have evidence that this happens? There is evidence that this has happened, although, I couldn't give an example of it happening right now. Talking to Dean and Ryan, this is a possible technique to decrease the number of flakey tests.
Jonathan Bedard
Comment 3 2017-05-03 14:01:11 PDT
Build Bot
Comment 4 2017-05-03 15:07:54 PDT
Comment on attachment 308947 [details] Patch Attachment 308947 [details] did not pass mac-ews (mac): Output: http://webkit-queues.webkit.org/results/3666850 New failing tests: imported/w3c/web-platform-tests/html/browsers/windows/browsing-context.html
Build Bot
Comment 5 2017-05-03 15:07:55 PDT
Created attachment 308962 [details] Archive of layout-test-results from ews101 for mac-elcapitan The attached test failures were seen while running run-webkit-tests on the mac-ews. Bot: ews101 Port: mac-elcapitan Platform: Mac OS X 10.11.6
Build Bot
Comment 6 2017-05-03 15:13:10 PDT
Comment on attachment 308947 [details] Patch Attachment 308947 [details] did not pass mac-wk2-ews (mac-wk2): Output: http://webkit-queues.webkit.org/results/3666856 New failing tests: imported/w3c/web-platform-tests/html/browsers/windows/browsing-context.html
Build Bot
Comment 7 2017-05-03 15:13:11 PDT
Created attachment 308964 [details] Archive of layout-test-results from ews107 for mac-elcapitan-wk2 The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews. Bot: ews107 Port: mac-elcapitan-wk2 Platform: Mac OS X 10.11.6
Build Bot
Comment 8 2017-05-03 15:19:09 PDT
Comment on attachment 308947 [details] Patch Attachment 308947 [details] did not pass mac-debug-ews (mac): Output: http://webkit-queues.webkit.org/results/3666854 New failing tests: imported/w3c/web-platform-tests/html/browsers/windows/browsing-context.html
Build Bot
Comment 9 2017-05-03 15:19:10 PDT
Created attachment 308965 [details] Archive of layout-test-results from ews112 for mac-elcapitan The attached test failures were seen while running run-webkit-tests on the mac-debug-ews. Bot: ews112 Port: mac-elcapitan Platform: Mac OS X 10.11.6
Ryan Haddad
Comment 10 2017-05-03 15:24:34 PDT
(In reply to Build Bot from comment #8) > Comment on attachment 308947 [details] > Patch > > Attachment 308947 [details] did not pass mac-debug-ews (mac): > Output: http://webkit-queues.webkit.org/results/3666854 > > New failing tests: > imported/w3c/web-platform-tests/html/browsers/windows/browsing-context.html --- /Volumes/Data/EWS/WebKit/WebKitBuild/Release/layout-test-results/imported/w3c/web-platform-tests/html/browsers/windows/browsing-context-expected.txt +++ /Volumes/Data/EWS/WebKit/WebKitBuild/Release/layout-test-results/imported/w3c/web-platform-tests/html/browsers/windows/browsing-context-actual.txt @@ -2,5 +2,5 @@ FAIL Check that browsing context has new, ready HTML document assert_equals: The document's encoding should be 'UTF-8'. expected "UTF-8" but got "windows-1252" PASS Check that new document nodes extant, empty PASS Check the document properties corresponding to the creator browsing context -FAIL Check the history.length of the created browsing context assert_equals: The history.length should be 1. expected 1 but got 100 +FAIL Check the history.length of the created browsing context assert_equals: The history.length should be 1. expected 1 but got 4 I guess this is a progression?
Build Bot
Comment 11 2017-05-03 15:39:58 PDT
Comment on attachment 308947 [details] Patch Attachment 308947 [details] did not pass ios-sim-ews (ios-simulator-wk2): Output: http://webkit-queues.webkit.org/results/3666897 New failing tests: imported/w3c/web-platform-tests/html/browsers/windows/browsing-context.html
Build Bot
Comment 12 2017-05-03 15:39:59 PDT
Created attachment 308975 [details] Archive of layout-test-results from ews121 for ios-simulator-wk2 The attached test failures were seen while running run-webkit-tests on the ios-sim-ews. Bot: ews121 Port: ios-simulator-wk2 Platform: Mac OS X 10.11.6
Alexey Proskuryakov
Comment 13 2017-05-03 16:54:20 PDT
I would be against making this change without fully understanding why it's needed.
Jonathan Bedard
Comment 14 2017-05-04 16:05:14 PDT
(In reply to Alexey Proskuryakov from comment #13) > I would be against making this change without fully understanding why it's > needed. What I find baffling here is EWS indicates that new tests are failing. This would mean that we have tests which are relying on state set in earlier tests. I'd like to see if I can reproduce any of these failures locally.
Ryosuke Niwa
Comment 15 2017-05-08 20:58:29 PDT
Comment on attachment 308947 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=308947&action=review > Tools/ChangeLog:3 > + webkitpy: Restart driver between test shards This will slow down tests. > Tools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:265 > + optparse.make_option("--persistent-driver", action="store_true", > + default=False, help="Attempt to keep the driver running as long as possible. Without this option, the driver will restart between shards."), Given the other option to restart the driver between each test is called --run-singly, perhaps we should call this --run-persistently. It most certainly shouldn't be a noun.
Jonathan Bedard
Comment 16 2017-05-09 08:11:00 PDT
(In reply to Ryosuke Niwa from comment #15) > Comment on attachment 308947 [details] > Patch > > View in context: > https://bugs.webkit.org/attachment.cgi?id=308947&action=review > > > Tools/ChangeLog:3 > > + webkitpy: Restart driver between test shards > > This will slow down tests. > Yes, it will. But, the results so far indicate that state is being retained between tests and effecting the results. I need to investigate precisely why this is, but I think our test suite being slower and deterministic is better than it being fast but flakey.
Ryosuke Niwa
Comment 17 2017-05-09 14:19:05 PDT
(In reply to Jonathan Bedard from comment #16) > (In reply to Ryosuke Niwa from comment #15) > > Comment on attachment 308947 [details] > > Patch > > > > View in context: > > https://bugs.webkit.org/attachment.cgi?id=308947&action=review > > > > > Tools/ChangeLog:3 > > > + webkitpy: Restart driver between test shards > > > > This will slow down tests. > > > > Yes, it will. But, the results so far indicate that state is being retained > between tests and effecting the results. I need to investigate precisely > why this is, but I think our test suite being slower and deterministic is > better than it being fast but flakey. I'm not so sure about that. If that were the case, we would be restarting DRT/WTR between each test but we don't. Also, people have argued for using the same instance to catch hard-to-reproduce bugs that only manifest when the process had been running for a while such as leaks that accumulate over time, or states not being cleared between page loads.
Ryosuke Niwa
Comment 18 2017-05-09 14:19:43 PDT
The change of this magnitude should probably be discussed on webkit-dev first and get a consensus.
Jonathan Bedard
Comment 19 2017-05-09 14:45:52 PDT
(In reply to Ryosuke Niwa from comment #17) > ... > > I'm not so sure about that. If that were the case, we would be restarting > DRT/WTR between each test but we don't. Also, people have argued for using > the same instance to catch hard-to-reproduce bugs that only manifest when > the process had been running for a while such as leaks that accumulate over > time, or states not being cleared between page loads. The central problem with the way we currently do things is that it's not deterministic. I agree that we shouldn't restart DRT/WTR between each test. However, the way we currently run things, any given test run will change how the tests are sharded. Meaning that hard-to-reproduce failures will get reproduced, but without a solid sequence of steps to reproduce them. This is why the change in this bug only restarts the test runners between shards. This means that we get the best of both worlds. A (mostly) persistent test runner that will retain state within a test shard, but, will also provide predictable and repeatable results. I don't think that test failure we cannot reproduce are helpful. They don't get acted on.
Jonathan Bedard
Comment 20 2017-05-09 14:46:27 PDT
(In reply to Ryosuke Niwa from comment #18) > The change of this magnitude should probably be discussed on webkit-dev > first and get a consensus. I agree, this bug was because Dean wanted to see what this change would look like. I have no intention of committing this until there is more of a consensus and the issue is better understood.
Ryosuke Niwa
Comment 21 2017-05-09 14:48:44 PDT
(In reply to Jonathan Bedard from comment #19) > (In reply to Ryosuke Niwa from comment #17) > > ... > > > > I'm not so sure about that. If that were the case, we would be restarting > > DRT/WTR between each test but we don't. Also, people have argued for using > > the same instance to catch hard-to-reproduce bugs that only manifest when > > the process had been running for a while such as leaks that accumulate over > > time, or states not being cleared between page loads. > > The central problem with the way we currently do things is that it's not > deterministic. Well, we've always had that issue, and this issue had been discussed extensively in the past. If you're changing the status quo, you must get buy-in from other contributors who are making WebKit code changes. > I agree that we shouldn't restart DRT/WTR between each test. However, the > way we currently run things, any given test run will change how the tests > are sharded. Meaning that hard-to-reproduce failures will get reproduced, > but without a solid sequence of steps to reproduce them. This is why the > change in this bug only restarts the test runners between shards. This > means that we get the best of both worlds. A (mostly) persistent test > runner that will retain state within a test shard, but, will also provide > predictable and repeatable results. Well, that's one solution to the problem of inconsistent states, and people have varying opinions on this matter. As such, I don't think we should be making changes like this without getting a buy-in from other engineers. > I don't think that test failure we cannot reproduce are helpful. They don't > get acted on. Except they do. Alexey fixed a number of bugs like that, and some of them turned out to be serious IPC bugs, and states in WebView / WKView not being cleared across page loads.
Jonathan Bedard
Comment 22 2021-08-05 15:35:51 PDT
Jonathan Bedard
Comment 23 2021-08-05 15:43:52 PDT
Reviving this bug. We have some data from https://bugs.webkit.org/show_bug.cgi?id=228429 that indicates that we have a handful of tests that fail when we change the order of shards. We need to change the order shards run in, to speed up run-webkit-tests, but we can't do that as long as doing so causes flakey failures. This should mean that the combination of this change and https://bugs.webkit.org/show_bug.cgi?id=228429 will be a performance improvement.
Jonathan Bedard
Comment 24 2021-08-09 08:52:57 PDT
Radar WebKit Bug Importer
Comment 25 2021-09-16 23:20:05 PDT
Note You need to log in before you can comment on or make changes to this bug.