WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
212381
[EWS] Add a special case for running the layout test step without aborting in case of many failures for WPT tests
https://bugs.webkit.org/show_bug.cgi?id=212381
Summary
[EWS] Add a special case for running the layout test step without aborting in...
Carlos Alberto Lopez Perez
Reported
2020-05-26 10:16:20 PDT
I want to prototype a bot that helps to update weekly our WPT import checkout <
https://lists.webkit.org/pipermail/webkit-dev/2020-March/031121.html
> And for doing this, I want to (re-)use our EWS system to get the layout test results for all the platforms (Mac, iOS, etc..). However, I'm finding that the EWS layout test results are not very useful when trying patches that touch lot of tests (like a WPT import) This is due to the EWS running the layout test step with: * "--exit-after-n-failures 30" -> usually a WPT import are thousands of new tests, if it aborts instead of giving back all the failures it is not really useful * "--skip-failing-tests" -> some tests may crash (or timeout) instead of fail in different platforms, if we don't try to re-recheck if they fail we miss that. See for example
r262062
(I only found that after landing the patch, because I had marked that test as failing (since it fails on release), but the EWS debug was skipping it, therefore I couldn't know it was crashing on debug) I understand this parameters are useful to make the EWS faster and are good enough for the general use-case So I want to propose something like not passing those parameters to the EWS for patches uploaded by a specific bugzilla user (the one this bot would use). That way the EWS can be useful for this bot, meanwhile we keep the current behavior for everybody else. Also it will help (to make everything faster and simplify everything) that for this user the layout test step only runs the tests inside the folder "imported/w3c/web-platform-tests/", since those are the only ones interesting for it anyway.
Attachments
Patch
(15.64 KB, patch)
2020-05-26 10:22 PDT
,
Carlos Alberto Lopez Perez
no flags
Details
Formatted Diff
Diff
Patch
(15.64 KB, patch)
2020-05-27 11:45 PDT
,
Carlos Alberto Lopez Perez
no flags
Details
Formatted Diff
Diff
Show Obsolete
(1)
View All
Add attachment
proposed patch, testcase, etc.
Carlos Alberto Lopez Perez
Comment 1
2020-05-26 10:22:45 PDT
Created
attachment 400257
[details]
Patch
Carlos Alberto Lopez Perez
Comment 2
2020-05-26 10:43:01 PDT
Note: i think the style check error is wrong. I followed the same style than the other tests above
Jonathan Bedard
Comment 3
2020-05-26 11:15:16 PDT
Comment on
attachment 400257
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=400257&action=review
> Tools/BuildSlaveSupport/ews-build/steps.py:1712 > + if patch_author == '
webkit-wpt-import-bot@igalia.com
':
This is an interesting idea. Aakash should weigh in. It does feel like this should be a list, it seems likely that Apple will want a wpt import bot at some point in the future.
Alexey Proskuryakov
Comment 4
2020-05-26 11:45:17 PDT
Comment on
attachment 400257
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=400257&action=review
> Tools/ChangeLog:3 > + [EWS] Add a special case for running the layout test step without aborting in case of many failures for WPT tests
Not directly related to this patch, just a suggestion. I think that this EWS also needs to run a lot of iterations to detect flaky tests. That would likely be more intricate than simply passing an argument for iteration count.
Carlos Alberto Lopez Perez
Comment 5
2020-05-27 10:52:46 PDT
(In reply to Jonathan Bedard from
comment #3
)
> Comment on
attachment 400257
[details]
> Patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=400257&action=review
> > > Tools/BuildSlaveSupport/ews-build/steps.py:1712 > > + if patch_author == '
webkit-wpt-import-bot@igalia.com
': > > This is an interesting idea. Aakash should weigh in. >
I talked with him on Slack, he was ok with this idea.
> It does feel like this should be a list, it seems likely that Apple will > want a wpt import bot at some point in the future.
Yes! I will turn that into a list
Carlos Alberto Lopez Perez
Comment 6
2020-05-27 11:32:08 PDT
(In reply to Alexey Proskuryakov from
comment #4
)
> Comment on
attachment 400257
[details]
> Patch > > View in context: >
https://bugs.webkit.org/attachment.cgi?id=400257&action=review
> > > Tools/ChangeLog:3 > > + [EWS] Add a special case for running the layout test step without aborting in case of many failures for WPT tests > > Not directly related to this patch, just a suggestion. I think that this EWS > also needs to run a lot of iterations to detect flaky tests. That would > likely be more intricate than simply passing an argument for iteration count.
Great idea. I was playing now with passing "--iterations N" to run-webkit-tests and it doesn't behave as I was expecting. It seems it marks the test as failing if it failed on > N/2 iterations, but otherwise marks it as pass. For example: on 3 iterations a tests passes 2 but fails 1, it simply reports it as pass. So I think I would need something different. Possible ideas that come to mind, * Add a new "--detect-flaky-tests N" switch to run-webkit-tests that does the same than "--iterations" but reports all the different results for the test it got. * Have the bot to upload the final patch N times and having everything green (on the EWS) N times before asking for review. Not sure now which one will be the best approach.
Carlos Alberto Lopez Perez
Comment 7
2020-05-27 11:45:57 PDT
Created
attachment 400359
[details]
Patch
Jonathan Bedard
Comment 8
2020-05-27 11:51:57 PDT
(In reply to Carlos Alberto Lopez Perez from
comment #6
)
> (In reply to Alexey Proskuryakov from
comment #4
) > > Comment on
attachment 400257
[details]
> > Patch > > > > View in context: > >
https://bugs.webkit.org/attachment.cgi?id=400257&action=review
> > > > > Tools/ChangeLog:3 > > > + [EWS] Add a special case for running the layout test step without aborting in case of many failures for WPT tests > > > > Not directly related to this patch, just a suggestion. I think that this EWS > > also needs to run a lot of iterations to detect flaky tests. That would > > likely be more intricate than simply passing an argument for iteration count. > > Great idea. > > I was playing now with passing "--iterations N" to run-webkit-tests and it > doesn't behave as I was expecting. It seems it marks the test as failing if > it failed on > N/2 iterations, but otherwise marks it as pass. For example: > on 3 iterations a tests passes 2 but fails 1, it simply reports it as pass.
To me, that feels like the wrong behavior. If you run with multiple iterations, we should use the worst result as the result of the test, but that's definitely a different bug.
> > So I think I would need something different. Possible ideas that come to > mind, > > * Add a new "--detect-flaky-tests N" switch to run-webkit-tests that does > the same than "--iterations" but reports all the different results for the > test it got. > * Have the bot to upload the final patch N times and having everything > green (on the EWS) N times before asking for review. > > > Not sure now which one will be the best approach.
Carlos Alberto Lopez Perez
Comment 9
2020-05-27 12:02:06 PDT
(In reply to Jonathan Bedard from
comment #8
)
> (In reply to Carlos Alberto Lopez Perez from
comment #6
) > > (In reply to Alexey Proskuryakov from
comment #4
) > > > Comment on
attachment 400257
[details]
> > > Patch > > > > > > View in context: > > >
https://bugs.webkit.org/attachment.cgi?id=400257&action=review
> > > > > > > Tools/ChangeLog:3 > > > > + [EWS] Add a special case for running the layout test step without aborting in case of many failures for WPT tests > > > > > > Not directly related to this patch, just a suggestion. I think that this EWS > > > also needs to run a lot of iterations to detect flaky tests. That would > > > likely be more intricate than simply passing an argument for iteration count. > > > > Great idea. > > > > I was playing now with passing "--iterations N" to run-webkit-tests and it > > doesn't behave as I was expecting. It seems it marks the test as failing if > > it failed on > N/2 iterations, but otherwise marks it as pass. For example: > > on 3 iterations a tests passes 2 but fails 1, it simply reports it as pass. > > To me, that feels like the wrong behavior. If you run with multiple > iterations, we should use the worst result as the result of the test, but > that's definitely a different bug. >
Its not always obvious which one its the worst result. For example, imagine a test that gives: Failure Timeout and Pass. Which its worse: Failure or Timeout? The expectation would have to contain both cases (or be an Skip) to be ok. So I see value in reporting all the results and letting the user/tool choose how to deal with that.
Alexey Proskuryakov
Comment 10
2020-05-27 12:17:18 PDT
My idea was that the logic would be in EWS, not in run-webkit-tests. That removes any API barriers that could be complicating flaky test detection and reporting. A counter-argument is that having the logic in EWS means that we need to restart the master when code changes.
Jonathan Bedard
Comment 11
2020-05-27 12:48:56 PDT
(In reply to Carlos Alberto Lopez Perez from
comment #9
)
> (In reply to Jonathan Bedard from
comment #8
) > > (In reply to Carlos Alberto Lopez Perez from
comment #6
) > > > (In reply to Alexey Proskuryakov from
comment #4
) > > > > Comment on
attachment 400257
[details]
> > > > Patch > > > > > > > > View in context: > > > >
https://bugs.webkit.org/attachment.cgi?id=400257&action=review
> > > > > > > > > Tools/ChangeLog:3 > > > > > + [EWS] Add a special case for running the layout test step without aborting in case of many failures for WPT tests > > > > > > > > Not directly related to this patch, just a suggestion. I think that this EWS > > > > also needs to run a lot of iterations to detect flaky tests. That would > > > > likely be more intricate than simply passing an argument for iteration count. > > > > > > Great idea. > > > > > > I was playing now with passing "--iterations N" to run-webkit-tests and it > > > doesn't behave as I was expecting. It seems it marks the test as failing if > > > it failed on > N/2 iterations, but otherwise marks it as pass. For example: > > > on 3 iterations a tests passes 2 but fails 1, it simply reports it as pass. > > > > To me, that feels like the wrong behavior. If you run with multiple > > iterations, we should use the worst result as the result of the test, but > > that's definitely a different bug. > > > > Its not always obvious which one its the worst result. For example, imagine > a test that gives: Failure Timeout and Pass. > > Which its worse: Failure or Timeout? The expectation would have to contain > both cases (or be an Skip) to be ok. > > So I see value in reporting all the results and letting the user/tool choose > how to deal with that.
There is actually a canonical hierarchy we defined for the results database (in Tools/Scripts/webkitpy/results/upload.py), but your point about about needing multiple expectations is a good one.
Carlos Alberto Lopez Perez
Comment 12
2020-05-27 13:29:45 PDT
(In reply to Alexey Proskuryakov from
comment #10
)
> My idea was that the logic would be in EWS, not in run-webkit-tests. That > removes any API barriers that could be complicating flaky test detection and > reporting. > > A counter-argument is that having the logic in EWS means that we need to > restart the master when code changes.
Having the logic in the EWS its a really interesting idea. Currently the EWS does this: - run1 layout-tests with patch - run2 layout-tests with patch (only run if unexpected failures in run1) - run3 layout-tests without patch (only run if unexpected failures in run1) * EWS status: green if new failures in both run1 and run2 (but not run 3). If a new failure happened on run1 but not run2 it ignores it. I'm thinking in modifying the logic of the EWS, so for this special use case it does this: - run1 layout-tests without patch - run2 layout-tests with patch (only run if _no_ unexpected failures in run1) - run3 layout-tests with patch (only run if _no_ unexpected failures in run2) - run4 layout-tests with patch (only run if _no_ unexpected failures in run3) * EWS result: green if no unexpected failures (on no one of the runs) and the 4 runs completed That way I can assert that: - No pre-existent failures existed (for example, due to the revision of WebKit beeing tested having issues) - It passed without any unexpected results 3 runs with the patch. What do you think?
Carlos Alberto Lopez Perez
Comment 13
2020-05-27 13:31:28 PDT
(In reply to Carlos Alberto Lopez Perez from
comment #12
)
> > Currently the EWS does this: > > - run1 layout-tests with patch > - run2 layout-tests with patch (only run if unexpected failures in run1) > - run3 layout-tests without patch (only run if unexpected failures in run1) > > * EWS status: green if new failures in both run1 and run2 (but not run 3). > If a new failure happened on run1 but not run2 it ignores it. >
Sorry, I mean: * EWS status: red if new failures in both run1 and run2 (but not run 3).
Alexey Proskuryakov
Comment 14
2020-05-27 13:36:54 PDT
One approach would be to provide a .txt with new test expectations; red/green is not so meaningful for imported tests. Ideally, EWS would understand which tests were modified, and only stress test those. Eventually, we'd like to do that even for regular patches, not just for imports.
Carlos Alberto Lopez Perez
Comment 15
2020-05-27 14:23:59 PDT
(In reply to Alexey Proskuryakov from
comment #14
)
> One approach would be to provide a .txt with new test expectations; > red/green is not so meaningful for imported tests. >
The patch to be tested will contain modified TestExpectation files. The bot for updating the tests would open new bugs on bugzilla (for the failures detected on the EWS on the first tries) and mark those tests as failing in the TestExpectation files. If the EWS are green after testing (the final version) of the patch generated by this bot, that would mean that the patch with the new imported tests doesn't cause any unexpected failures (all new failures have been triaged) The goal is to automate updating the imported WPT tests without causing new unexpected failures on the bots, to avoid creating extra work for gardeners and QA people. And the reviewer approving the patch for landing would have a great confidence on that (since it passed 3 rounds on every EWS without causing any new failure/flakiness)
> Ideally, EWS would understand which tests were modified, and only stress > test those. Eventually, we'd like to do that even for regular patches, not > just for imports.
That can be very difficult to do for patches touching support files. A general update of the WPT imported tests may modify (for example) the python program used for starting the http servers needed for testing (./wpt serve) and that may cause issues on some specific tests that are not being updated in the patch tested, but are part of the set of imported tests from a previous patch. Running the whole set of imported WPT tests is affordable (this would not run daily) and gives me more peace of mind that this will now cause unexpected issues.
Carlos Alberto Lopez Perez
Comment 16
2020-05-27 14:25:03 PDT
(In reply to Carlos Alberto Lopez Perez from
comment #15
)
> Running the whole set of imported WPT tests is affordable (this would not > run daily) and gives me more peace of mind that this will now cause > unexpected issues.
sorry, I mean: "and gives me more peace of mind that this will not cause unexpected issues"
Carlos Alberto Lopez Perez
Comment 17
2020-05-29 07:44:33 PDT
Can I have r+ this patch? I would like to start testing the outputs the EWS gives me. In future patches I can implement what we discussed here (trying to modify the logic of the EWS). But for now this is good enough for me to start testing.
Jonathan Bedard
Comment 18
2020-05-29 14:07:10 PDT
(In reply to Carlos Alberto Lopez Perez from
comment #17
)
> Can I have r+ this patch? > > I would like to start testing the outputs the EWS gives me. In future > patches I can implement what we discussed here (trying to modify the logic > of the EWS). But for now this is good enough for me to start testing.
I was hoping we could get Aakash's input on this before landing, but he has been out this week, so it seems wrong to block landing this on Aakash. I do think we should put off landing this until Monday, because it will require a buildbot restart, which I think is unwise Friday afternoon.
Aakash Jain
Comment 19
2020-05-29 18:49:26 PDT
it looks good to me.
EWS
Comment 20
2020-06-01 09:15:47 PDT
Committed
r262379
: <
https://trac.webkit.org/changeset/262379
> All reviewed patches have been landed. Closing bug and clearing flags on
attachment 400359
[details]
.
Radar WebKit Bug Importer
Comment 21
2020-06-01 09:16:19 PDT
<
rdar://problem/63831282
>
Carlos Alberto Lopez Perez
Comment 22
2020-06-01 09:46:59 PDT
(In reply to Jonathan Bedard from
comment #18
)
> I do think we should put off landing this until Monday, because it will > require a buildbot restart, which I think is unwise Friday afternoon.
Sure. I have landed it today Monday. If possible to restart the EWS at some point, that would be great. Thanks!
Jonathan Bedard
Comment 23
2020-06-03 13:19:44 PDT
(In reply to Carlos Alberto Lopez Perez from
comment #22
)
> (In reply to Jonathan Bedard from
comment #18
) > > I do think we should put off landing this until Monday, because it will > > require a buildbot restart, which I think is unwise Friday afternoon. > > Sure. I have landed it today Monday. > > If possible to restart the EWS at some point, that would be great. > > Thanks!
Buildbot has been restarted. Sorry it took so long to land this!
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug