Bug 85591 - [Gtk] Many tests revealed as passing after moving from Skipped to test_expectations.txt
Summary: [Gtk] Many tests revealed as passing after moving from Skipped to test_expect...
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Zan Dobersek
Depends on:
Reported: 2012-05-04 01:20 PDT by Zan Dobersek
Modified: 2012-07-20 10:24 PDT (History)
5 users (show)

See Also:

webkit-patch list-passing (7.56 KB, patch)
2012-05-04 01:37 PDT, Zan Dobersek
no flags Details | Formatted Diff | Diff
List of passing tests 05042012 0730UTC (27.55 KB, application/octet-stream)
2012-05-04 01:39 PDT, Zan Dobersek
no flags Details
List of passing tests 05042012 1520UTC (33.87 KB, application/octet-stream)
2012-05-04 09:51 PDT, Zan Dobersek
no flags Details
Patch (19.93 KB, patch)
2012-05-04 10:16 PDT, Zan Dobersek
no flags Details | Formatted Diff | Diff
Patch (2.43 KB, patch)
2012-05-09 12:05 PDT, Zan Dobersek
no flags Details | Formatted Diff | Diff
List of passing tests 05102012 1030UTC (57.92 KB, text/plain)
2012-05-10 03:44 PDT, Zan Dobersek
no flags Details
Patch (1.67 KB, patch)
2012-05-13 22:40 PDT, Zan Dobersek
no flags Details | Formatted Diff | Diff
Updated patch (2.90 KB, patch)
2012-05-18 12:55 PDT, Zan Dobersek
no flags Details | Formatted Diff | Diff
Updated patch (3.37 KB, patch)
2012-05-23 12:27 PDT, Zan Dobersek
no flags Details | Formatted Diff | Diff
Patch proposal (5.01 KB, patch)
2012-06-12 08:04 PDT, Mario Sanchez Prada
no flags Details | Formatted Diff | Diff
Patch (7.15 KB, patch)
2012-06-24 05:21 PDT, Zan Dobersek
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Zan Dobersek 2012-05-04 01:20:39 PDT
After moving to test_expectations.txt-like handling of test failures, meaning every test is now run and the result compared to expectations, a lot of tests are now passing. Fixes for these tests probably slipped into the source tree unnoticed or even unplanned (by accident one might say), making tests pass but keeping them skipped. This bug will cover the expectation changes for these tests and their status and numbers throughout this process.

While most of these tests at the moment have wrong expectations for each build configuration that's covered by the GTK builders, some are skipped only on debug configurations. There are also test cases that fail only on 32-bit configurations - we should look into adding modifiers for test_expectations.txt that cover such configurations only (but this is probably a task for another bug).
Comment 1 Zan Dobersek 2012-05-04 01:37:01 PDT
Created attachment 140180 [details]
webkit-patch list-passing

This is just a simple command added to webkit-patch that lists tests from the GTK builders that are passing while expected to fail.

It is just a one-off tool I've written specifically for this task. The GTK builders and the number of builds to check are all hardcoded, making this command in its current form useless for any other port. I don't intend to get this patch reviewed and landed unless there is some interest by any other ports in it.

The command gathers passing tests with wrong expectations for the last 5 builds of each GTK builder and produces a list of such tests that is common to each of those builders. If wished the number of builds can be increased to get more consistent list of such tests, but this can get tricky sometimes as even the builders are unreliable (lost slaves, interrupted or broken builds etc). For each builder a common set of these tests through those last 5 builds is printed out along with a list of tests for each build that are not common. Finally, these three common sets are intersected to provide a test list of tests that passed in all the 15 builds (all the 5 builds on each of the 3 builders). Also printed out for each builder is the list of tests that passed in all the 5 builds on that builder but is not passing on other builders' builds.
Comment 2 Zan Dobersek 2012-05-04 01:39:45 PDT
Created attachment 140181 [details]
List of passing tests 05042012 0730UTC

This is the output of running `webkit-patch list-passing` on 4th of May at 7:30 UTC.

For the last 5 builds of each builder, a list of 138 tests passing on all the builders was made.
Comment 3 Zan Dobersek 2012-05-04 01:51:10 PDT
I plan to remove incorrect expectations that are not CRASH and only have a BUGWKGTK modifier without review later today. If any of the tests backfires I'll create a proper bug entry for it and link the bug # here.

For incorrect test expectations that have a BUGWK% modifier that points to a bug entry that's specific to the Gtk failure, I'll upload a patch for a review here and post a note in that bug entry about the test seeming to now pass. If, after removing incorrect expectations, the test seems OK, I'll be closing those bug entries. If the tests backfire, I'll note that in the bug entry and update the test expectations.

For incorrect test expectations that have a BUGWK% modifier that points to a bug from which the offending commit originates, I'll remove them without review and, if a test backfires, create a bug entry, linking the bug # here.

Incorrect test expectations that are CRASH will be handled with more care and removed all at once. I'll be posting the patch up here for a review.

I'll then proceed with analyzing test sets that pass only on a specific build configuration with a similar approach.
Comment 4 Zan Dobersek 2012-05-04 07:15:48 PDT
Committed r116094: <http://trac.webkit.org/changeset/116094>
Comment 5 Zan Dobersek 2012-05-04 07:17:00 PDT
(In reply to comment #4)
> Committed r116094: <http://trac.webkit.org/changeset/116094>

This is the first part, removing test expectations for tests that are passing across all the builders and had the BUGWKGTK modifier.
Comment 6 Martin Robinson 2012-05-04 08:58:26 PDT
Nice Zan. Perhaps the new tool would be useful to other ports...
Comment 7 Zan Dobersek 2012-05-04 09:51:08 PDT
Created attachment 140259 [details]
List of passing tests 05042012 1520UTC

The first removing of foul test expectations seems to have passed by without problems.

I'm attaching the output of `webkit-patch list-passing` when run on 4th of May at 15:20 UTC. A list of 138 tests passes on all the builders but has incorrect expectations listed. The next patch will require a review and will remove these expectations for about 110 tests.
Comment 8 Zan Dobersek 2012-05-04 10:16:47 PDT
Created attachment 140263 [details]
Comment 9 Zan Dobersek 2012-05-04 10:21:27 PDT
(In reply to comment #8)
> Created an attachment (id=140263) [details]
> Patch

This patch removes incorrect expectations for most of the rest of these tests. Contrary to what I planned in comment #3, I'll be updating bug entries later, after the tests either keep passing or backfire, to keep the work volume at a low level (expectations for tests from 43 bugs are being changed).
Comment 10 Zan Dobersek 2012-05-04 10:40:45 PDT
Comment on attachment 140263 [details]

Clearing flags on attachment: 140263

Committed r116122: <http://trac.webkit.org/changeset/116122>
Comment 11 Zan Dobersek 2012-05-05 10:52:17 PDT
All the test cases whose expectations were removed in r116122 and had Gtk-specific bug entries had these entries updated and closed if possible.

There are two bug entries (bug #54073 and bug #54136) that cover tests whose failures seem to be observable only by inspecting pixel output but had their expectations removed. I'll reinsert a PASS expectation into test_expectations.txt and note that while the test passes, it actually presents incorrect behavior.
Comment 12 Philippe Normand 2012-05-07 07:42:35 PDT
Good job, Zan. Thanks you!
Comment 13 Zan Dobersek 2012-05-09 12:05:07 PDT
Created attachment 140990 [details]
Comment 14 Philippe Normand 2012-05-09 13:21:30 PDT
Comment on attachment 140990 [details]

View in context: https://bugs.webkit.org/attachment.cgi?id=140990&action=review

> LayoutTests/platform/gtk/test_expectations.txt:-324
> -// This test crashes whatever test follows it. Perhaps it's related to the previous failure.
> -BUGWKGTK : fast/dom/gc-10.html = CRASH

I think I tried in the past to unflag this one but I had to back off. Let's try again anyway.
Comment 15 Zan Dobersek 2012-05-09 13:24:20 PDT
Comment on attachment 140990 [details]

Clearing flags on attachment: 140990

Committed r116553: <http://trac.webkit.org/changeset/116553>
Comment 16 Zan Dobersek 2012-05-10 03:44:10 PDT
Created attachment 141132 [details]
List of passing tests 05102012 1030UTC

The list of passing tests common to all the GTK builders is now 37 tests long. I'll remove their expectations shortly and observe their behavior, updating their bug entries appropriately.

On a different note, all the tests that had CRASH expectations removed seem to be behaving well, their bug entries will be updated soon.
Comment 17 Zan Dobersek 2012-05-10 04:20:09 PDT
Committed r116627: <http://trac.webkit.org/changeset/116627>
Comment 18 Zan Dobersek 2012-05-10 04:26:14 PDT
(In reply to comment #17)
> Committed r116627: <http://trac.webkit.org/changeset/116627>

This commit removes erroneous test expectations that are not CRASH for a couple of more tests. Their bug entries, if existent, will be updated accordingly.

This currently leaves two more tests passing that have CRASH expectations and a couple of tests who pass but probably have pixel failures. I'll upload another patch for review for the former and compile another list of the latter.
Comment 19 Zan Dobersek 2012-05-13 22:32:26 PDT
Committed r116918: <http://trac.webkit.org/changeset/116918>
Comment 20 Zan Dobersek 2012-05-13 22:40:54 PDT
Created attachment 141641 [details]
Comment 21 Zan Dobersek 2012-05-13 22:58:32 PDT
Committed r116920: <http://trac.webkit.org/changeset/116920>
Comment 22 Zan Dobersek 2012-05-18 12:55:32 PDT
Created attachment 142767 [details]
Updated patch

Remove redundant expectations for crashes in debug builds as well
Comment 23 Zan Dobersek 2012-05-23 12:27:41 PDT
Created attachment 143615 [details]
Updated patch

Removed two more crash expectations for two SVG tests that were apparently fixed somewhere between r117786 and r117795.
Comment 24 Philippe Normand 2012-05-23 15:52:42 PDT
Comment on attachment 143615 [details]
Updated patch

Ok. One minor thing though, while I think that for this specific case it's good to group the patches in a single bug... but we usually do one patch per bug :)
Comment 25 Zan Dobersek 2012-05-24 22:00:41 PDT
Comment on attachment 143615 [details]
Updated patch

Clearing flags on attachment: 143615

Committed r118474: <http://trac.webkit.org/changeset/118474>
Comment 26 Mario Sanchez Prada 2012-06-12 07:34:11 PDT
I have observed that the following list of tests have been consitently passing in GTK bots as well:

Expected to fail, but passed: (38)

I've run those tests both in the GTK 64 debug and release bots with the following commands:

  * Tools/Scripts/run-webkit-tests --gtk
  * Tools/Scripts/run-webkit-tests --gtk --iterations=10 
  * Tools/Scripts/run-webkit-tests --gtk --iterations=10 --repeat-each=10

And I get always the same (positive) result.

I'll attach a patch to unskip them promptly, asking for review.
Comment 27 Mario Sanchez Prada 2012-06-12 08:04:15 PDT
Created attachment 147088 [details]
Patch proposal

Attaching a patch that unskips the sputnik tests in that list only. I left other tests out of this patch because they were reported I can't be sure about them passing in the 32bit bot since it has been a while since that bot doesn't compile (it's fixed now, but still compiling atm) and so I can't be sure whether they are passing now or not.

Sputnik tests, in the other hand, are consistently passing in GTK 64 bit bots now and also passed in the last build in the 32bit bot that we have results for (http://build.webkit.org/builders/GTK%20Linux%2032-bit%20Release/builds/24071/steps/layout-test/logs/stdio), so I'd say it's pretty safe to unskip them.
Comment 28 Martin Robinson 2012-06-12 08:07:59 PDT
Comment on attachment 147088 [details]
Patch proposal

Let's give it a shot!
Comment 29 Mario Sanchez Prada 2012-06-12 08:11:48 PDT
Committed r120079: <http://trac.webkit.org/changeset/120079>
Comment 30 Mario Sanchez Prada 2012-06-12 09:18:40 PDT
(In reply to comment #27)
> [...]
> Sputnik tests, in the other hand, are consistently passing in GTK 64 bit bots 
> now and also passed in the last build in the 32bit bot that we have results for 
> (http://build.webkit.org/builders/GTK%20Linux%2032-bit%20Release/builds/24071/steps/layout-test/logs/stdio),
> so I'd say it's pretty safe to unskip them.

It seems the tests we unskipped are now doing well. From the list of remaining tests, I've observed only the following two ones are passing now in all the 3 GTK bots:


However, not unskipping yet because I can't tell for sure whether they are flaky or not. Let's observe them for a while more
Comment 31 Zan Dobersek 2012-06-24 05:21:00 PDT
Created attachment 149191 [details]
Comment 32 Zan Dobersek 2012-06-24 05:29:32 PDT
(In reply to comment #31)
> Created an attachment (id=149191) [details]
> Patch

This patch suggests creating a new category for test failures that occur only on specific build architectures. Tests there are marked as flaky so the expectations cover build architectures where the test passes as well.

Basically, this is a workaround for adding 32-bit and 64-bit modifiers which at the moment seems to me as an overkill (no other port seems to require it).
Comment 33 Martin Robinson 2012-07-02 10:37:08 PDT
Comment on attachment 149191 [details]

This seems reasonable!
Comment 34 Zan Dobersek 2012-07-02 10:56:15 PDT
Comment on attachment 149191 [details]

Clearing flags on attachment: 149191

Committed r121694: <http://trac.webkit.org/changeset/121694>
Comment 35 Zan Dobersek 2012-07-20 10:10:04 PDT
Committed r123224: <http://trac.webkit.org/changeset/123224>
Comment 36 Zan Dobersek 2012-07-20 10:16:52 PDT
(In reply to comment #35)
> Committed r123224: <http://trac.webkit.org/changeset/123224>

Accidentally closed the bug while committing this, but there should be no more unexpected passes on the GTK builder. If there are still any, they will be handled outside of the scope of this bug.

A bit sad that it took me more than three months to get this sorted, but still, at least now it's done.
Comment 37 Martin Robinson 2012-07-20 10:24:25 PDT
Incredible work!