55608 – rebaseline-chromium-webkit-tests doesn't work correctly with version-specific baselines

RESOLVED FIXED 55608

rebaseline-chromium-webkit-tests doesn't work correctly with version-specific baselines

https://bugs.webkit.org/show_bug.cgi?id=55608

Summary rebaseline-chromium-webkit-tests doesn't work correctly with version-specific...

Dirk Pranke

Reported 2011-03-02 13:26:40 PST

Right now RCWT will put all of the baselines in chromium-mac, chromium-win, or chromium-linux. It may slightly screw up any baselines for Vista and Win7 as well, and doesn't handle Snow Leopard or Linux-x86_64 at all. This should be fixed ASAP, otherwise not only can we not rebaseline the new bots, but even rebaselining the existing bots on Mac may be problematic due to the fallback path including a bunch of baselines in SL. I should have a patch for this within 24 hours.

Attachments
Patch (6.81 KB, patch) 2011-03-15 15:55 PDT, Dirk Pranke	no flags	Details Formatted Diff Diff
Patch (8.72 KB, patch) 2011-03-15 16:10 PDT, Dirk Pranke	no flags	Details Formatted Diff Diff
fix missing s (8.72 KB, patch) 2011-03-15 16:19 PDT, Dirk Pranke	no flags	Details Formatted Diff Diff
make all_platforms actually be a list (8.70 KB, patch) 2011-03-15 16:29 PDT, Dirk Pranke	no flags	Details Formatted Diff Diff
clean up mechanism for updating expectations file, make tests pass (28.65 KB, patch) 2011-03-29 21:05 PDT, Dirk Pranke	no flags	Details Formatted Diff Diff
Patch (32.70 KB, patch) 2011-03-30 18:40 PDT, Dirk Pranke	no flags	Details Formatted Diff Diff
ignore - wrong bug. (48.89 KB, patch) 2011-03-31 18:49 PDT, Dirk Pranke	no flags	Details Formatted Diff Diff
rebase to HEAD (37.37 KB, patch) 2011-04-01 12:15 PDT, Dirk Pranke	no flags	Details Formatted Diff Diff
Show Obsolete (7) View All Add attachment proposed patch, testcase, etc.

Dirk Pranke

Comment 1 2011-03-15 15:55:08 PDT

Created attachment 85866 [details] Patch

Dirk Pranke

Comment 2 2011-03-15 16:10:29 PDT

Created attachment 85869 [details] Patch

Dirk Pranke

Comment 3 2011-03-15 16:19:57 PDT

Created attachment 85871 [details] fix missing s

Dirk Pranke

Comment 4 2011-03-15 16:29:35 PDT

Created attachment 85872 [details] make all_platforms actually be a list

Dirk Pranke

Comment 5 2011-03-29 21:05:01 PDT

Created attachment 87458 [details] clean up mechanism for updating expectations file, make tests pass

Dirk Pranke

Comment 6 2011-03-29 21:06:04 PDT

Okay, this patch is now in the neighborhood of being right, but it's not all the way there yet. It still needs some more tests and I need to think through the logic of deleting the lines marked "REBASELINE" from the file a little better.

Adam Barth

Comment 7 2011-03-29 23:06:36 PDT

I don't understand why I need to write "REBASELINE" in the file at all. Why can't it just look at the bots and give me a menu to choose from?

Ojan Vafai

Comment 8 2011-03-29 23:33:34 PDT

(In reply to comment #7) > I don't understand why I need to write "REBASELINE" in the file at all. Why can't it just look at the bots and give me a menu to choose from? The chromium bots run failing tests and mark them as failing in test_expectations.txt. So, there would be ~1000 tests to choose from. I agree we need a better syntax, but I think that's orthogonal to this patch. This patch is about making the current tool work right with fallback orders. Perhaps in a separate patch we could add a flag for whether to look at the failing bots or test_expectations.txt. The plan James and I had sketched out for "webkit-patch rebaseline" was that it would take either a list of tests as arguments or JSON if you wanted something more specific (e.g. rebaseline foo/bar.html on linux and foo/baz.html on win and linux). If no arguments are supplied, then it could look at the bots. Then we could have a GUI on the layout test dashboard that would generate this JSON and call out to a rebaseline bot, which would call webkit-patch rebaseline and post a patch to bugzilla.

Adam Barth

Comment 9 2011-03-29 23:47:55 PDT

> The chromium bots run failing tests and mark them as failing in test_expectations.txt. So, there would be ~1000 tests to choose from. I agree we need a better syntax, but I think that's orthogonal to this patch. This patch is about making the current tool work right with fallback orders. Yeah, of course. > Perhaps in a separate patch we could add a flag for whether to look at the failing bots or test_expectations.txt. Sure. I don't care about the 1000 tests that are in there. I care about the ones that are making the bots red right now. > The plan James and I had sketched out for "webkit-patch rebaseline" was that it would take either a list of tests as arguments or JSON if you wanted something more specific (e.g. rebaseline foo/bar.html on linux and foo/baz.html on win and linux). If no arguments are supplied, then it could look at the bots. Then we could have a GUI on the layout test dashboard that would generate this JSON and call out to a rebaseline bot, which would call webkit-patch rebaseline and post a patch to bugzilla. That could work. Ideally, I want a screen with all the diffs (like the flakiness dashboard) and a big button that says "these are all fine, please make this green with these expectations."

Dirk Pranke

Comment 10 2011-03-30 18:40:59 PDT

Created attachment 87652 [details] Patch

Dirk Pranke

Comment 11 2011-03-30 18:44:49 PDT

Okay, this patch should work "correctly" insofar as it will attempt to rebaseline all of the results on the canaries and update the baselines as specified in the file. There are two gotchas: 1) The tool will no longer delete anything from the test_expectations.txt file. 2) The tool will not yet run all of the possible test configurations; in particular, it won't rebaseline both the GPU- and non-GPU results at once, which it probably should do to be safe. The main problem is that without actually working through all of the different configurations, the tool can't easily know whether it is safe to delete lines in the file or not. The previous logic took a stab at it, but got it wrong at least as often at it got it right. These two gotchas should be fixed by the patch in bug 55191, which will depend on this fix and attempt to do a better, more comprehensive job. We likely will not want to land this patch without also landing that one, but I'm splitting them into two to make the patches easier to follow.

Dirk Pranke

Comment 12 2011-03-31 18:49:35 PDT

Created attachment 87814 [details] ignore - wrong bug.

Adam Barth

Comment 13 2011-03-31 20:08:01 PDT

This bug made me sad today.

Adam Barth

Comment 14 2011-03-31 20:08:44 PDT

Is attachment 87652 [details] meant to be marked obsolete, or does it need review?

Dirk Pranke

Comment 15 2011-03-31 20:22:53 PDT

Comment on attachment 87652 [details] Patch it needs review. R+ this and the 55191 and happiness will be yours (hopefully!)

Dirk Pranke

Comment 16 2011-03-31 20:23:44 PDT

(In reply to comment #15) > (From update of attachment 87652 [details]) > it needs review. R+ this and the 55191 and happiness will be yours (hopefully!) Or better, try the patches locally and let me know if you have any issues.

Dirk Pranke

Comment 17 2011-03-31 20:24:10 PDT

(In reply to comment #16) > (In reply to comment #15) > > (From update of attachment 87652 [details] [details]) > > it needs review. R+ this and the 55191 and happiness will be yours (hopefully!) > > Or better, try the patches locally and let me know if you have any issues. I haven't actually tried them with real-world rebaselines yet, but will be doing so tomorrow.

Adam Barth

Comment 18 2011-03-31 20:45:20 PDT

> it needs review. R+ this and the 55191 and happiness will be yours (hopefully!) Oh how you lure me with your siren song.

Adam Barth

Comment 19 2011-03-31 20:47:23 PDT

Comment on attachment 87652 [details] Patch So magical.

Dirk Pranke

Comment 20 2011-04-01 12:15:45 PDT

Created attachment 87894 [details] rebase to HEAD

Dirk Pranke

Comment 21 2011-04-01 12:18:55 PDT

Committed r82704: <http://trac.webkit.org/changeset/82704>

Tony Chang

Comment 22 2011-04-01 12:23:29 PDT

Comment on attachment 87894 [details] rebase to HEAD View in context: https://bugs.webkit.org/attachment.cgi?id=87894&action=review > Tools/Scripts/webkitpy/layout_tests/rebaseline_chromium_webkit_tests_unittest.py:258 > + #self.assertEqual(filesystem.files['/test.checkout/LayoutTests/platform/test/test_expectations.txt'], '') Did you mean to check in with this line commented out? Can we update it to the actual result? > Tools/Scripts/webkitpy/layout_tests/rebaseline_chromium_webkit_tests_unittest.py:277 > + #self.assertEqual(filesystem.files['/test.checkout/LayoutTests/platform/test/test_expectations.txt'], '') This too.

Dirk Pranke

Comment 23 2011-04-01 12:30:09 PDT

(In reply to comment #22) > (From update of attachment 87894 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=87894&action=review > > > Tools/Scripts/webkitpy/layout_tests/rebaseline_chromium_webkit_tests_unittest.py:258 > > + #self.assertEqual(filesystem.files['/test.checkout/LayoutTests/platform/test/test_expectations.txt'], '') > > Did you mean to check in with this line commented out? Can we update it to the actual result? > Yes. In 55608 we stopped updating the test_expectations file altogether, but we resumed updating it in 55191. So I commented this out temporarily with the intent to uncomment in the next patch. Then, when testing 55191, I realized that I had moved the updating of the test_expectations class out of Rebaseliner.run() altogether (it's now done in realMain), so this assert would never pass. So I deleted these lines completely prior to actually landing that patch. The other unit tests do test that we update the file correctly, so I didn't think this particular assert was needed.

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware PC

OS OS X 10.5

Product WebKit

Component Tools / Tests

Assignee

Dirk Pranke

Reported

2011-03-02 13:26 PST

Modified

2011-04-01 12:30 PDT History

CC List

8 users Show

URL

Keywords

Depends on

Blocks

55191

Dependencies

tree graph