Bug 59570 - new-run-webkit-tests is flaky on mac
Summary: new-run-webkit-tests is flaky on mac
Status: RESOLVED WORKSFORME
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC OS X 10.5
: P2 Normal
Assignee: Geoffrey Garen
URL:
Keywords:
Depends on:
Blocks: 64491
  Show dependency treegraph
 
Reported: 2011-04-26 18:39 PDT by Geoffrey Garen
Modified: 2012-06-08 16:09 PDT (History)
8 users (show)

See Also:


Attachments
patch (6.62 KB, text/plain)
2011-04-26 18:40 PDT, Geoffrey Garen
no flags Details
patch (4.59 KB, patch)
2011-04-26 18:56 PDT, Geoffrey Garen
no flags Details | Formatted Diff | Diff
patch (6.43 KB, patch)
2011-04-27 18:23 PDT, Geoffrey Garen
no flags Details | Formatted Diff | Diff
Patch (7.79 KB, patch)
2011-04-27 21:53 PDT, Geoffrey Garen
levin: review-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Geoffrey Garen 2011-04-26 18:39:46 PDT
I have a patch that bundles up all flaky tests to run at the end.

Seems to fail the http tests, though.
Comment 1 Geoffrey Garen 2011-04-26 18:40:29 PDT
Created attachment 91206 [details]
patch

Here's the patch.
Comment 2 Dirk Pranke 2011-04-26 18:51:01 PDT
when you say "seems to fail the http tests", you mean that running with this patch failed on the http tests, right?

without this patch, do the http tests run okay?
Comment 3 Geoffrey Garen 2011-04-26 18:56:42 PDT
Created attachment 91209 [details]
patch

Now with http tests fixed.
Comment 4 Geoffrey Garen 2011-04-26 18:58:56 PDT
> when you say "seems to fail the http tests", you mean that running with this patch failed on the http tests, right?

Yes.

> without this patch, do the http tests run okay?

Yes.

Should be fixed in the latest patch. It was a bad rename.
Comment 5 Geoffrey Garen 2011-04-27 18:23:56 PDT
Created attachment 91396 [details]
patch

Updated to retry more reliably and serialize Java and a few more things.
Comment 6 Dirk Pranke 2011-04-27 18:37:04 PDT
how well do the tests work when you run w/o this patch? If we retry the failures, does everything pass for you on the second run?
Comment 7 Geoffrey Garen 2011-04-27 21:50:28 PDT
(In reply to comment #6)
> how well do the tests work when you run w/o this patch? If we retry the failures, does everything pass for you on the second run?

Without this patch, running with 16-32 parallel instances, I got ~10 failures per run, and the failures didn't necessarily pass on retry. The failures seemed to change unpredictably between runs. (At least, there were enough failures with enough variety that I couldn't suss out a pattern.)
Comment 8 Geoffrey Garen 2011-04-27 21:53:10 PDT
Created attachment 91425 [details]
Patch
Comment 9 Geoffrey Garen 2011-04-27 21:58:18 PDT
I'm not super-familiar with this code, and I've never programmed in Python  before, but I think this patch is worth an r?. On my SnowLeopard Mac Pro, it makes new-run-webkit-tests basically as reliable as run-webkit-tests.
Comment 10 Dirk Pranke 2011-04-27 22:16:15 PDT
Comment on attachment 91425 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=91425&action=review

Given what you responded to, I'm okay with adding a concept like this to the test harness in order to make tests more reliable, but we wouldn't want to land this patch as is. I've got some specific comments below, but unless you really want to learn the code better and hack on this yourself, you can leave this for a day or two and I'll be happy to implement it.


> Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:72
> +_locked_tests = [

We definitely wouldn't want to hard-code the list of tests and directories into this file, which is generic code shared across all of the implementations. One possibility would be to add methods on the Port objects (e.g. webkitpy/layout_tests/port/webkit.py) to return lists of files and directories that needed to be serialized.

However, it's probably better to pull this out of code altogether and use the test_expectations.txt file to track this. I think the right way to go would be to add a SERIAL modifier (or some similar name), just like we have modifiers to indicate that tests can be SLOW.

> Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:103
> +    # These tests rely on filesystem state, so they can't run in parallel.

This is almost certainly not true in general. It isn't for the chromium ports, at least. There may be some tests that have state, but not all of them, and you would lose a lot of parallelism by enforcing this this broadly.

> Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:-595
> -            int(self._options.child_processes) > 1 and not self._options.experimental_fully_parallel)

By itself, I think the idea of running all of the retried tests serially makes sense. There should rarely be many of them. However, it might make sense to indicate this simply by changing the value of options.child_processes to 1 before the retry (or passing the child_processes value in as a parameter instead).

> Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:696
> +                worker_number = num_workers + 1

This block of code duplicates the above block, and should be refactored into a separate method.
Comment 11 David Levin 2011-04-28 13:09:53 PDT
Comment on attachment 91425 [details]
Patch

Based on Dirk's comments, it looks like there are some reasonable things to address before this can be landed,so r-.
Comment 12 Ojan Vafai 2011-05-10 11:14:50 PDT
(In reply to comment #10)
> (From update of attachment 91425 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=91425&action=review
> > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:72
> > +_locked_tests = [
> 
> We definitely wouldn't want to hard-code the list of tests and directories into this file, which is generic code shared across all of the implementations. One possibility would be to add methods on the Port objects (e.g. webkitpy/layout_tests/port/webkit.py) to return lists of files and directories that needed to be serialized.
> 
> However, it's probably better to pull this out of code altogether and use the test_expectations.txt file to track this. I think the right way to go would be to add a SERIAL modifier (or some similar name), just like we have modifiers to indicate that tests can be SLOW.

I don't think I agree. In general, the problem here is often with the tests themselves, so the list will want to be shared across ports. That said, it might make sense to put it in port/base.py and that way different ports can add to/override the list in base.py.

> > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:103
> > +    # These tests rely on filesystem state, so they can't run in parallel.
> 
> This is almost certainly not true in general. It isn't for the chromium ports, at least. There may be some tests that have state, but not all of them, and you would lose a lot of parallelism by enforcing this this broadly.
> 
> > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:-595
> > -            int(self._options.child_processes) > 1 and not self._options.experimental_fully_parallel)
> 
> By itself, I think the idea of running all of the retried tests serially makes sense. There should rarely be many of them. However, it might make sense to indicate this simply by changing the value of options.child_processes to 1 before the retry (or passing the child_processes value in as a parameter instead).

Have we tried whether just making the retry step serial is sufficient? It would be great if we didn't need to maintain a list of tests at all. Certainly we could commit that part of this patch and see where that gets us on the NRWT bot.
Comment 13 Geoffrey Garen 2011-05-10 11:46:55 PDT
> Have we tried whether just making the retry step serial is sufficient? It would be great if we didn't need to maintain a list of tests at all. Certainly we could commit that part of this patch and see where that gets us on the NRWT bot.

I haven't tried that. Seems like it could work.
Comment 14 Eric Seidel (no email) 2011-07-05 13:29:54 PDT
Comment on attachment 91425 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=91425&action=review

>>> Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:72
>>> +_locked_tests = [
>> 
>> We definitely wouldn't want to hard-code the list of tests and directories into this file, which is generic code shared across all of the implementations. One possibility would be to add methods on the Port objects (e.g. webkitpy/layout_tests/port/webkit.py) to return lists of files and directories that needed to be serialized.
>> 
>> However, it's probably better to pull this out of code altogether and use the test_expectations.txt file to track this. I think the right way to go would be to add a SERIAL modifier (or some similar name), just like we have modifiers to indicate that tests can be SLOW.
> 
> I don't think I agree. In general, the problem here is often with the tests themselves, so the list will want to be shared across ports. That said, it might make sense to put it in port/base.py and that way different ports can add to/override the list in base.py.

I thought we were using test_expectations for this sort of thing?
http://trac.webkit.org/browser/trunk/LayoutTests/platform/mac/test_expectations.txt#L66

>>> Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:103
>>> +    # These tests rely on filesystem state, so they can't run in parallel.
>> 
>> This is almost certainly not true in general. It isn't for the chromium ports, at least. There may be some tests that have state, but not all of them, and you would lose a lot of parallelism by enforcing this this broadly.
> 
> Have we tried whether just making the retry step serial is sufficient? It would be great if we didn't need to maintain a list of tests at all. Certainly we could commit that part of this patch and see where that gets us on the NRWT bot.

I believe the current approach is that tests which fail will just be retried (in serial).  That's not ideal though because then the tests are reported as "flaky" even though we expected them to fail when run in parallel.

I think we should work on encoding this information in the LayoutTests file system, and making DRT more parallel-friendly.

Durring the proposed transition (this week) we're just running NRWT with a single child process on the bots to avoid these issues.

> Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:114
> +    # Java plug-in seems to malfunction unpredictably when used under parallel load.

Java seems to malfunction in general. :)  the commit-queue has filed several flaky-test bugs about java related tests.
Comment 15 Dirk Pranke 2011-07-05 13:38:01 PDT
(In reply to comment #14)
> (From update of attachment 91425 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=91425&action=review
> 
> >>> Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:72
> >>> +_locked_tests = [
> >> 
> >> We definitely wouldn't want to hard-code the list of tests and directories into this file, which is generic code shared across all of the implementations. One possibility would be to add methods on the Port objects (e.g. webkitpy/layout_tests/port/webkit.py) to return lists of files and directories that needed to be serialized.
> >> 
> >> However, it's probably better to pull this out of code altogether and use the test_expectations.txt file to track this. I think the right way to go would be to add a SERIAL modifier (or some similar name), just like we have modifiers to indicate that tests can be SLOW.
> > 
> > I don't think I agree. In general, the problem here is often with the tests themselves, so the list will want to be shared across ports. That said, it might make sense to put it in port/base.py and that way different ports can add to/override the list in base.py.
> 
> I thought we were using test_expectations for this sort of thing?
> http://trac.webkit.org/browser/trunk/LayoutTests/platform/mac/test_expectations.txt#L66
> 

We were (or I was, at least). Three problems with this approach ...

1) It is unclear if non-Chromium ports want to use test expectations at all. 

2) It is unclear how to share the knowledge of tests that are generally flaky across ports if each port has its own test_expectations file.

3) Given the current implementation of test_expectations, the best we can do is to mark a test as "PASS FAIL", which is pretty broad and can cause an awful lot of things to be hidden. 

Per my comment in #10, If we wanted to make things cleaner via the test_expectations.txt route, I think the best approach to tracking this would be to introduce a new modifier like "SERIAL" to complement "SLOW" to indicate that the test should be run serially.
Comment 16 Eric Seidel (no email) 2011-08-08 10:14:44 PDT
I'm not sure what to do with this bug.  I guess we'll keep it around until we complete the transition of all bots over to parallel NRWT.  We'll have to solve this sort of issue somehow to do that conversion.
Comment 17 Dirk Pranke 2011-12-21 14:08:08 PST
ggaren, do you think we can close this now? It seems like retrying the failures is probably good enough?
Comment 18 Dirk Pranke 2012-06-08 16:09:06 PDT
I'm gonna close this; please reopen or file a new bug if you wish for us to do something more.