Bug 59570

Summary:

new-run-webkit-tests is flaky on mac

Product:

WebKit

Reporter:

Geoffrey Garen <ggaren>

Component:

Tools / Tests

Assignee:

Geoffrey Garen <ggaren>

Status:

RESOLVED WORKSFORME

Severity:

Normal

CC:

aroben, dpranke, eric, ggaren, jennb, ojan, sam, tony

Priority:

Version:

528+ (Nightly build)

Hardware:

OS:

OS X 10.5

Bug Depends on:

Bug Blocks:

64491

Attachments:

Description	Flags
patch	none
patch	none
patch	none
Patch	levin: review-

Geoffrey Garen

Reported 2011-04-26 18:39:46 PDT

I have a patch that bundles up all flaky tests to run at the end. Seems to fail the http tests, though.

Attachments
patch (6.62 KB, text/plain) 2011-04-26 18:40 PDT, Geoffrey Garen	no flags	Details
patch (4.59 KB, patch) 2011-04-26 18:56 PDT, Geoffrey Garen	no flags	Details Formatted Diff Diff
patch (6.43 KB, patch) 2011-04-27 18:23 PDT, Geoffrey Garen	no flags	Details Formatted Diff Diff
Patch (7.79 KB, patch) 2011-04-27 21:53 PDT, Geoffrey Garen	levin: review-	Details Formatted Diff Diff
Show Obsolete (3) View All Add attachment proposed patch, testcase, etc.

Geoffrey Garen

Comment 1 2011-04-26 18:40:29 PDT

Created attachment 91206 [details] patch Here's the patch.

Dirk Pranke

Comment 2 2011-04-26 18:51:01 PDT

when you say "seems to fail the http tests", you mean that running with this patch failed on the http tests, right? without this patch, do the http tests run okay?

Geoffrey Garen

Comment 3 2011-04-26 18:56:42 PDT

Created attachment 91209 [details] patch Now with http tests fixed.

Geoffrey Garen

Comment 4 2011-04-26 18:58:56 PDT

> when you say "seems to fail the http tests", you mean that running with this patch failed on the http tests, right? Yes. > without this patch, do the http tests run okay? Yes. Should be fixed in the latest patch. It was a bad rename.

Geoffrey Garen

Comment 5 2011-04-27 18:23:56 PDT

Created attachment 91396 [details] patch Updated to retry more reliably and serialize Java and a few more things.

Dirk Pranke

Comment 6 2011-04-27 18:37:04 PDT

how well do the tests work when you run w/o this patch? If we retry the failures, does everything pass for you on the second run?

Geoffrey Garen

Comment 7 2011-04-27 21:50:28 PDT

(In reply to comment #6) > how well do the tests work when you run w/o this patch? If we retry the failures, does everything pass for you on the second run? Without this patch, running with 16-32 parallel instances, I got ~10 failures per run, and the failures didn't necessarily pass on retry. The failures seemed to change unpredictably between runs. (At least, there were enough failures with enough variety that I couldn't suss out a pattern.)

Geoffrey Garen

Comment 8 2011-04-27 21:53:10 PDT

Created attachment 91425 [details] Patch

Geoffrey Garen

Comment 9 2011-04-27 21:58:18 PDT

I'm not super-familiar with this code, and I've never programmed in Python before, but I think this patch is worth an r?. On my SnowLeopard Mac Pro, it makes new-run-webkit-tests basically as reliable as run-webkit-tests.

Dirk Pranke

Comment 10 2011-04-27 22:16:15 PDT

Comment on attachment 91425 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=91425&action=review Given what you responded to, I'm okay with adding a concept like this to the test harness in order to make tests more reliable, but we wouldn't want to land this patch as is. I've got some specific comments below, but unless you really want to learn the code better and hack on this yourself, you can leave this for a day or two and I'll be happy to implement it. > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:72 > +_locked_tests = [ We definitely wouldn't want to hard-code the list of tests and directories into this file, which is generic code shared across all of the implementations. One possibility would be to add methods on the Port objects (e.g. webkitpy/layout_tests/port/webkit.py) to return lists of files and directories that needed to be serialized. However, it's probably better to pull this out of code altogether and use the test_expectations.txt file to track this. I think the right way to go would be to add a SERIAL modifier (or some similar name), just like we have modifiers to indicate that tests can be SLOW. > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:103 > + # These tests rely on filesystem state, so they can't run in parallel. This is almost certainly not true in general. It isn't for the chromium ports, at least. There may be some tests that have state, but not all of them, and you would lose a lot of parallelism by enforcing this this broadly. > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:-595 > - int(self._options.child_processes) > 1 and not self._options.experimental_fully_parallel) By itself, I think the idea of running all of the retried tests serially makes sense. There should rarely be many of them. However, it might make sense to indicate this simply by changing the value of options.child_processes to 1 before the retry (or passing the child_processes value in as a parameter instead). > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:696 > + worker_number = num_workers + 1 This block of code duplicates the above block, and should be refactored into a separate method.

David Levin

Comment 11 2011-04-28 13:09:53 PDT

Comment on attachment 91425 [details] Patch Based on Dirk's comments, it looks like there are some reasonable things to address before this can be landed,so r-.

Ojan Vafai

Comment 12 2011-05-10 11:14:50 PDT

(In reply to comment #10) > (From update of attachment 91425 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=91425&action=review > > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:72 > > +_locked_tests = [ > > We definitely wouldn't want to hard-code the list of tests and directories into this file, which is generic code shared across all of the implementations. One possibility would be to add methods on the Port objects (e.g. webkitpy/layout_tests/port/webkit.py) to return lists of files and directories that needed to be serialized. > > However, it's probably better to pull this out of code altogether and use the test_expectations.txt file to track this. I think the right way to go would be to add a SERIAL modifier (or some similar name), just like we have modifiers to indicate that tests can be SLOW. I don't think I agree. In general, the problem here is often with the tests themselves, so the list will want to be shared across ports. That said, it might make sense to put it in port/base.py and that way different ports can add to/override the list in base.py. > > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:103 > > + # These tests rely on filesystem state, so they can't run in parallel. > > This is almost certainly not true in general. It isn't for the chromium ports, at least. There may be some tests that have state, but not all of them, and you would lose a lot of parallelism by enforcing this this broadly. > > > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:-595 > > - int(self._options.child_processes) > 1 and not self._options.experimental_fully_parallel) > > By itself, I think the idea of running all of the retried tests serially makes sense. There should rarely be many of them. However, it might make sense to indicate this simply by changing the value of options.child_processes to 1 before the retry (or passing the child_processes value in as a parameter instead). Have we tried whether just making the retry step serial is sufficient? It would be great if we didn't need to maintain a list of tests at all. Certainly we could commit that part of this patch and see where that gets us on the NRWT bot.

Geoffrey Garen

Comment 13 2011-05-10 11:46:55 PDT

> Have we tried whether just making the retry step serial is sufficient? It would be great if we didn't need to maintain a list of tests at all. Certainly we could commit that part of this patch and see where that gets us on the NRWT bot. I haven't tried that. Seems like it could work.

Eric Seidel (no email)

Comment 14 2011-07-05 13:29:54 PDT

Comment on attachment 91425 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=91425&action=review >>> Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:72 >>> +_locked_tests = [ >> >> We definitely wouldn't want to hard-code the list of tests and directories into this file, which is generic code shared across all of the implementations. One possibility would be to add methods on the Port objects (e.g. webkitpy/layout_tests/port/webkit.py) to return lists of files and directories that needed to be serialized. >> >> However, it's probably better to pull this out of code altogether and use the test_expectations.txt file to track this. I think the right way to go would be to add a SERIAL modifier (or some similar name), just like we have modifiers to indicate that tests can be SLOW. > > I don't think I agree. In general, the problem here is often with the tests themselves, so the list will want to be shared across ports. That said, it might make sense to put it in port/base.py and that way different ports can add to/override the list in base.py. I thought we were using test_expectations for this sort of thing? http://trac.webkit.org/browser/trunk/LayoutTests/platform/mac/test_expectations.txt#L66 >>> Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:103 >>> + # These tests rely on filesystem state, so they can't run in parallel. >> >> This is almost certainly not true in general. It isn't for the chromium ports, at least. There may be some tests that have state, but not all of them, and you would lose a lot of parallelism by enforcing this this broadly. > > Have we tried whether just making the retry step serial is sufficient? It would be great if we didn't need to maintain a list of tests at all. Certainly we could commit that part of this patch and see where that gets us on the NRWT bot. I believe the current approach is that tests which fail will just be retried (in serial). That's not ideal though because then the tests are reported as "flaky" even though we expected them to fail when run in parallel. I think we should work on encoding this information in the LayoutTests file system, and making DRT more parallel-friendly. Durring the proposed transition (this week) we're just running NRWT with a single child process on the bots to avoid these issues. > Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:114 > + # Java plug-in seems to malfunction unpredictably when used under parallel load. Java seems to malfunction in general. :) the commit-queue has filed several flaky-test bugs about java related tests.

Dirk Pranke

Comment 15 2011-07-05 13:38:01 PDT

(In reply to comment #14) > (From update of attachment 91425 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=91425&action=review > > >>> Tools/Scripts/webkitpy/layout_tests/layout_package/manager.py:72 > >>> +_locked_tests = [ > >> > >> We definitely wouldn't want to hard-code the list of tests and directories into this file, which is generic code shared across all of the implementations. One possibility would be to add methods on the Port objects (e.g. webkitpy/layout_tests/port/webkit.py) to return lists of files and directories that needed to be serialized. > >> > >> However, it's probably better to pull this out of code altogether and use the test_expectations.txt file to track this. I think the right way to go would be to add a SERIAL modifier (or some similar name), just like we have modifiers to indicate that tests can be SLOW. > > > > I don't think I agree. In general, the problem here is often with the tests themselves, so the list will want to be shared across ports. That said, it might make sense to put it in port/base.py and that way different ports can add to/override the list in base.py. > > I thought we were using test_expectations for this sort of thing? > http://trac.webkit.org/browser/trunk/LayoutTests/platform/mac/test_expectations.txt#L66 > We were (or I was, at least). Three problems with this approach ... 1) It is unclear if non-Chromium ports want to use test expectations at all. 2) It is unclear how to share the knowledge of tests that are generally flaky across ports if each port has its own test_expectations file. 3) Given the current implementation of test_expectations, the best we can do is to mark a test as "PASS FAIL", which is pretty broad and can cause an awful lot of things to be hidden. Per my comment in #10, If we wanted to make things cleaner via the test_expectations.txt route, I think the best approach to tracking this would be to introduce a new modifier like "SERIAL" to complement "SLOW" to indicate that the test should be run serially.

Eric Seidel (no email)

Comment 16 2011-08-08 10:14:44 PDT

I'm not sure what to do with this bug. I guess we'll keep it around until we complete the transition of all bots over to parallel NRWT. We'll have to solve this sort of issue somehow to do that conversion.

Dirk Pranke

Comment 17 2011-12-21 14:08:08 PST

ggaren, do you think we can close this now? It seems like retrying the failures is probably good enough?

Dirk Pranke

Comment 18 2012-06-08 16:09:06 PDT

I'm gonna close this; please reopen or file a new bug if you wish for us to do something more.

Note You need to log in before you can comment on or make changes to this bug.