WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
NEW
171615
webkitpy: Restart driver between test shards
https://bugs.webkit.org/show_bug.cgi?id=171615
Summary
webkitpy: Restart driver between test shards
Jonathan Bedard
Reported
2017-05-03 13:16:49 PDT
One the cause of flakey tests is that some state can be retained between tests. To mitigate this, we could restart the driver between test shards.
Attachments
Patch
(2.58 KB, patch)
2017-05-03 14:01 PDT
,
Jonathan Bedard
no flags
Details
Formatted Diff
Diff
Archive of layout-test-results from ews101 for mac-elcapitan
(780.57 KB, application/zip)
2017-05-03 15:07 PDT
,
Build Bot
no flags
Details
Archive of layout-test-results from ews107 for mac-elcapitan-wk2
(815.87 KB, application/zip)
2017-05-03 15:13 PDT
,
Build Bot
no flags
Details
Archive of layout-test-results from ews112 for mac-elcapitan
(1.53 MB, application/zip)
2017-05-03 15:19 PDT
,
Build Bot
no flags
Details
Archive of layout-test-results from ews121 for ios-simulator-wk2
(14.16 MB, application/zip)
2017-05-03 15:39 PDT
,
Build Bot
no flags
Details
Patch
(2.59 KB, patch)
2021-08-05 15:35 PDT
,
Jonathan Bedard
no flags
Details
Formatted Diff
Diff
Patch
(2.78 KB, patch)
2021-08-09 08:52 PDT
,
Jonathan Bedard
no flags
Details
Formatted Diff
Diff
Show Obsolete
(6)
View All
Add attachment
proposed patch, testcase, etc.
Alexey Proskuryakov
Comment 1
2017-05-03 13:20:25 PDT
What state can persist in the driver? Do we have evidence that this happens?
Jonathan Bedard
Comment 2
2017-05-03 13:25:40 PDT
(In reply to Alexey Proskuryakov from
comment #1
)
> What state can persist in the driver? Do we have evidence that this happens?
There is evidence that this has happened, although, I couldn't give an example of it happening right now. Talking to Dean and Ryan, this is a possible technique to decrease the number of flakey tests.
Jonathan Bedard
Comment 3
2017-05-03 14:01:11 PDT
Created
attachment 308947
[details]
Patch
Build Bot
Comment 4
2017-05-03 15:07:54 PDT
Comment on
attachment 308947
[details]
Patch
Attachment 308947
[details]
did not pass mac-ews (mac): Output:
http://webkit-queues.webkit.org/results/3666850
New failing tests: imported/w3c/web-platform-tests/html/browsers/windows/browsing-context.html
Build Bot
Comment 5
2017-05-03 15:07:55 PDT
Created
attachment 308962
[details]
Archive of layout-test-results from ews101 for mac-elcapitan The attached test failures were seen while running run-webkit-tests on the mac-ews. Bot: ews101 Port: mac-elcapitan Platform: Mac OS X 10.11.6
Build Bot
Comment 6
2017-05-03 15:13:10 PDT
Comment on
attachment 308947
[details]
Patch
Attachment 308947
[details]
did not pass mac-wk2-ews (mac-wk2): Output:
http://webkit-queues.webkit.org/results/3666856
New failing tests: imported/w3c/web-platform-tests/html/browsers/windows/browsing-context.html
Build Bot
Comment 7
2017-05-03 15:13:11 PDT
Created
attachment 308964
[details]
Archive of layout-test-results from ews107 for mac-elcapitan-wk2 The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews. Bot: ews107 Port: mac-elcapitan-wk2 Platform: Mac OS X 10.11.6
Build Bot
Comment 8
2017-05-03 15:19:09 PDT
Comment on
attachment 308947
[details]
Patch
Attachment 308947
[details]
did not pass mac-debug-ews (mac): Output:
http://webkit-queues.webkit.org/results/3666854
New failing tests: imported/w3c/web-platform-tests/html/browsers/windows/browsing-context.html
Build Bot
Comment 9
2017-05-03 15:19:10 PDT
Created
attachment 308965
[details]
Archive of layout-test-results from ews112 for mac-elcapitan The attached test failures were seen while running run-webkit-tests on the mac-debug-ews. Bot: ews112 Port: mac-elcapitan Platform: Mac OS X 10.11.6
Ryan Haddad
Comment 10
2017-05-03 15:24:34 PDT
(In reply to Build Bot from
comment #8
)
> Comment on
attachment 308947
[details]
> Patch > >
Attachment 308947
[details]
did not pass mac-debug-ews (mac): > Output:
http://webkit-queues.webkit.org/results/3666854
> > New failing tests: > imported/w3c/web-platform-tests/html/browsers/windows/browsing-context.html
--- /Volumes/Data/EWS/WebKit/WebKitBuild/Release/layout-test-results/imported/w3c/web-platform-tests/html/browsers/windows/browsing-context-expected.txt +++ /Volumes/Data/EWS/WebKit/WebKitBuild/Release/layout-test-results/imported/w3c/web-platform-tests/html/browsers/windows/browsing-context-actual.txt @@ -2,5 +2,5 @@ FAIL Check that browsing context has new, ready HTML document assert_equals: The document's encoding should be 'UTF-8'. expected "UTF-8" but got "windows-1252" PASS Check that new document nodes extant, empty PASS Check the document properties corresponding to the creator browsing context -FAIL Check the history.length of the created browsing context assert_equals: The history.length should be 1. expected 1 but got 100 +FAIL Check the history.length of the created browsing context assert_equals: The history.length should be 1. expected 1 but got 4 I guess this is a progression?
Build Bot
Comment 11
2017-05-03 15:39:58 PDT
Comment on
attachment 308947
[details]
Patch
Attachment 308947
[details]
did not pass ios-sim-ews (ios-simulator-wk2): Output:
http://webkit-queues.webkit.org/results/3666897
New failing tests: imported/w3c/web-platform-tests/html/browsers/windows/browsing-context.html
Build Bot
Comment 12
2017-05-03 15:39:59 PDT
Created
attachment 308975
[details]
Archive of layout-test-results from ews121 for ios-simulator-wk2 The attached test failures were seen while running run-webkit-tests on the ios-sim-ews. Bot: ews121 Port: ios-simulator-wk2 Platform: Mac OS X 10.11.6
Alexey Proskuryakov
Comment 13
2017-05-03 16:54:20 PDT
I would be against making this change without fully understanding why it's needed.
Jonathan Bedard
Comment 14
2017-05-04 16:05:14 PDT
(In reply to Alexey Proskuryakov from
comment #13
)
> I would be against making this change without fully understanding why it's > needed.
What I find baffling here is EWS indicates that new tests are failing. This would mean that we have tests which are relying on state set in earlier tests. I'd like to see if I can reproduce any of these failures locally.
Ryosuke Niwa
Comment 15
2017-05-08 20:58:29 PDT
Comment on
attachment 308947
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=308947&action=review
> Tools/ChangeLog:3 > + webkitpy: Restart driver between test shards
This will slow down tests.
> Tools/Scripts/webkitpy/layout_tests/run_webkit_tests.py:265 > + optparse.make_option("--persistent-driver", action="store_true", > + default=False, help="Attempt to keep the driver running as long as possible. Without this option, the driver will restart between shards."),
Given the other option to restart the driver between each test is called --run-singly, perhaps we should call this --run-persistently. It most certainly shouldn't be a noun.
Jonathan Bedard
Comment 16
2017-05-09 08:11:00 PDT
(In reply to Ryosuke Niwa from
comment #15
)
> Comment on
attachment 308947
[details]
> Patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=308947&action=review
> > > Tools/ChangeLog:3 > > + webkitpy: Restart driver between test shards > > This will slow down tests. >
Yes, it will. But, the results so far indicate that state is being retained between tests and effecting the results. I need to investigate precisely why this is, but I think our test suite being slower and deterministic is better than it being fast but flakey.
Ryosuke Niwa
Comment 17
2017-05-09 14:19:05 PDT
(In reply to Jonathan Bedard from
comment #16
)
> (In reply to Ryosuke Niwa from
comment #15
) > > Comment on
attachment 308947
[details]
> > Patch > > > > View in context: > >
https://bugs.webkit.org/attachment.cgi?id=308947&action=review
> > > > > Tools/ChangeLog:3 > > > + webkitpy: Restart driver between test shards > > > > This will slow down tests. > > > > Yes, it will. But, the results so far indicate that state is being retained > between tests and effecting the results. I need to investigate precisely > why this is, but I think our test suite being slower and deterministic is > better than it being fast but flakey.
I'm not so sure about that. If that were the case, we would be restarting DRT/WTR between each test but we don't. Also, people have argued for using the same instance to catch hard-to-reproduce bugs that only manifest when the process had been running for a while such as leaks that accumulate over time, or states not being cleared between page loads.
Ryosuke Niwa
Comment 18
2017-05-09 14:19:43 PDT
The change of this magnitude should probably be discussed on webkit-dev first and get a consensus.
Jonathan Bedard
Comment 19
2017-05-09 14:45:52 PDT
(In reply to Ryosuke Niwa from
comment #17
)
> ... > > I'm not so sure about that. If that were the case, we would be restarting > DRT/WTR between each test but we don't. Also, people have argued for using > the same instance to catch hard-to-reproduce bugs that only manifest when > the process had been running for a while such as leaks that accumulate over > time, or states not being cleared between page loads.
The central problem with the way we currently do things is that it's not deterministic. I agree that we shouldn't restart DRT/WTR between each test. However, the way we currently run things, any given test run will change how the tests are sharded. Meaning that hard-to-reproduce failures will get reproduced, but without a solid sequence of steps to reproduce them. This is why the change in this bug only restarts the test runners between shards. This means that we get the best of both worlds. A (mostly) persistent test runner that will retain state within a test shard, but, will also provide predictable and repeatable results. I don't think that test failure we cannot reproduce are helpful. They don't get acted on.
Jonathan Bedard
Comment 20
2017-05-09 14:46:27 PDT
(In reply to Ryosuke Niwa from
comment #18
)
> The change of this magnitude should probably be discussed on webkit-dev > first and get a consensus.
I agree, this bug was because Dean wanted to see what this change would look like. I have no intention of committing this until there is more of a consensus and the issue is better understood.
Ryosuke Niwa
Comment 21
2017-05-09 14:48:44 PDT
(In reply to Jonathan Bedard from
comment #19
)
> (In reply to Ryosuke Niwa from
comment #17
) > > ... > > > > I'm not so sure about that. If that were the case, we would be restarting > > DRT/WTR between each test but we don't. Also, people have argued for using > > the same instance to catch hard-to-reproduce bugs that only manifest when > > the process had been running for a while such as leaks that accumulate over > > time, or states not being cleared between page loads. > > The central problem with the way we currently do things is that it's not > deterministic.
Well, we've always had that issue, and this issue had been discussed extensively in the past. If you're changing the status quo, you must get buy-in from other contributors who are making WebKit code changes.
> I agree that we shouldn't restart DRT/WTR between each test. However, the > way we currently run things, any given test run will change how the tests > are sharded. Meaning that hard-to-reproduce failures will get reproduced, > but without a solid sequence of steps to reproduce them. This is why the > change in this bug only restarts the test runners between shards. This > means that we get the best of both worlds. A (mostly) persistent test > runner that will retain state within a test shard, but, will also provide > predictable and repeatable results.
Well, that's one solution to the problem of inconsistent states, and people have varying opinions on this matter. As such, I don't think we should be making changes like this without getting a buy-in from other engineers.
> I don't think that test failure we cannot reproduce are helpful. They don't > get acted on.
Except they do. Alexey fixed a number of bugs like that, and some of them turned out to be serious IPC bugs, and states in WebView / WKView not being cleared across page loads.
Jonathan Bedard
Comment 22
2021-08-05 15:35:51 PDT
Created
attachment 435032
[details]
Patch
Jonathan Bedard
Comment 23
2021-08-05 15:43:52 PDT
Reviving this bug. We have some data from
https://bugs.webkit.org/show_bug.cgi?id=228429
that indicates that we have a handful of tests that fail when we change the order of shards. We need to change the order shards run in, to speed up run-webkit-tests, but we can't do that as long as doing so causes flakey failures. This should mean that the combination of this change and
https://bugs.webkit.org/show_bug.cgi?id=228429
will be a performance improvement.
Jonathan Bedard
Comment 24
2021-08-09 08:52:57 PDT
Created
attachment 435186
[details]
Patch
Radar WebKit Bug Importer
Comment 25
2021-09-16 23:20:05 PDT
<
rdar://problem/83227925
>
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug