Bug 55608 - rebaseline-chromium-webkit-tests doesn't work correctly with version-specific baselines
Summary: rebaseline-chromium-webkit-tests doesn't work correctly with version-specific...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC OS X 10.5
: P2 Normal
Assignee: Dirk Pranke
URL:
Keywords:
Depends on:
Blocks: 55191
  Show dependency treegraph
 
Reported: 2011-03-02 13:26 PST by Dirk Pranke
Modified: 2011-04-01 12:30 PDT (History)
8 users (show)

See Also:


Attachments
Patch (6.81 KB, patch)
2011-03-15 15:55 PDT, Dirk Pranke
no flags Details | Formatted Diff | Diff
Patch (8.72 KB, patch)
2011-03-15 16:10 PDT, Dirk Pranke
no flags Details | Formatted Diff | Diff
fix missing s (8.72 KB, patch)
2011-03-15 16:19 PDT, Dirk Pranke
no flags Details | Formatted Diff | Diff
make all_platforms actually be a list (8.70 KB, patch)
2011-03-15 16:29 PDT, Dirk Pranke
no flags Details | Formatted Diff | Diff
clean up mechanism for updating expectations file, make tests pass (28.65 KB, patch)
2011-03-29 21:05 PDT, Dirk Pranke
no flags Details | Formatted Diff | Diff
Patch (32.70 KB, patch)
2011-03-30 18:40 PDT, Dirk Pranke
no flags Details | Formatted Diff | Diff
ignore - wrong bug. (48.89 KB, patch)
2011-03-31 18:49 PDT, Dirk Pranke
no flags Details | Formatted Diff | Diff
rebase to HEAD (37.37 KB, patch)
2011-04-01 12:15 PDT, Dirk Pranke
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Dirk Pranke 2011-03-02 13:26:40 PST
Right now RCWT will put all of the baselines in chromium-mac, chromium-win, or chromium-linux. It may slightly screw up any baselines for Vista and Win7 as well, and doesn't handle Snow Leopard or Linux-x86_64 at all.

This should be fixed ASAP, otherwise not only can we not rebaseline the new bots, but even rebaselining the existing bots on Mac may be problematic due to the fallback path including a bunch of baselines in SL.

I should have a patch for this within 24 hours.
Comment 1 Dirk Pranke 2011-03-15 15:55:08 PDT
Created attachment 85866 [details]
Patch
Comment 2 Dirk Pranke 2011-03-15 16:10:29 PDT
Created attachment 85869 [details]
Patch
Comment 3 Dirk Pranke 2011-03-15 16:19:57 PDT
Created attachment 85871 [details]
fix missing s
Comment 4 Dirk Pranke 2011-03-15 16:29:35 PDT
Created attachment 85872 [details]
make all_platforms actually be a list
Comment 5 Dirk Pranke 2011-03-29 21:05:01 PDT
Created attachment 87458 [details]
clean up mechanism for updating expectations file, make tests pass
Comment 6 Dirk Pranke 2011-03-29 21:06:04 PDT
Okay, this patch is now in the neighborhood of being right, but it's not all the way there yet. It still needs some more tests and I need to think through the logic of deleting the lines marked "REBASELINE" from the file a little better.
Comment 7 Adam Barth 2011-03-29 23:06:36 PDT
I don't understand why I need to write "REBASELINE" in the file at all.  Why can't it just look at the bots and give me a menu to choose from?
Comment 8 Ojan Vafai 2011-03-29 23:33:34 PDT
(In reply to comment #7)
> I don't understand why I need to write "REBASELINE" in the file at all.  Why can't it just look at the bots and give me a menu to choose from?

The chromium bots run failing tests and mark them as failing in test_expectations.txt. So, there would be ~1000 tests to choose from. I agree we need a better syntax, but I think that's orthogonal to this patch. This patch is about making the current tool work right with fallback orders.

Perhaps in a separate patch we could add a flag for whether to look at the failing bots or test_expectations.txt.

The plan James and I had sketched out for "webkit-patch rebaseline" was that it would take either a list of tests as arguments or JSON if you wanted something more specific (e.g. rebaseline foo/bar.html on linux and foo/baz.html on win and linux). If no arguments are supplied, then it could look at the bots. Then we could have a GUI on the layout test dashboard that would generate this JSON and call out to a rebaseline bot, which would call webkit-patch rebaseline and post a patch to bugzilla.
Comment 9 Adam Barth 2011-03-29 23:47:55 PDT
> The chromium bots run failing tests and mark them as failing in test_expectations.txt. So, there would be ~1000 tests to choose from. I agree we need a better syntax, but I think that's orthogonal to this patch. This patch is about making the current tool work right with fallback orders.

Yeah, of course.

> Perhaps in a separate patch we could add a flag for whether to look at the failing bots or test_expectations.txt.

Sure.  I don't care about the 1000 tests that are in there.  I care about the ones that are making the bots red right now.

> The plan James and I had sketched out for "webkit-patch rebaseline" was that it would take either a list of tests as arguments or JSON if you wanted something more specific (e.g. rebaseline foo/bar.html on linux and foo/baz.html on win and linux). If no arguments are supplied, then it could look at the bots. Then we could have a GUI on the layout test dashboard that would generate this JSON and call out to a rebaseline bot, which would call webkit-patch rebaseline and post a patch to bugzilla.

That could work.  Ideally, I want a screen with all the diffs (like the flakiness dashboard) and a big button that says "these are all fine, please make this green with these expectations."
Comment 10 Dirk Pranke 2011-03-30 18:40:59 PDT
Created attachment 87652 [details]
Patch
Comment 11 Dirk Pranke 2011-03-30 18:44:49 PDT
Okay, this patch should work "correctly" insofar as it will attempt to rebaseline all of the results on the canaries and update the baselines as specified in the file.

There are two gotchas:

1) The tool will no longer delete anything from the test_expectations.txt file.

2) The tool will not yet run all of the possible test configurations; in particular, it won't rebaseline both the GPU- and non-GPU results at once, which it probably should do to be safe.

The main problem is that without actually working through all of the different configurations, the tool can't easily know whether it is safe to delete lines in the file or not. The previous logic took a stab at it, but got it wrong at least as often at it got it right. 

These two gotchas should be fixed by the patch in bug 55191, which will depend on this fix and attempt to do a better, more comprehensive job. 

We likely will not want to land this patch without also landing that one, but I'm splitting them into two to make the patches easier to follow.
Comment 12 Dirk Pranke 2011-03-31 18:49:35 PDT
Created attachment 87814 [details]
ignore - wrong bug.
Comment 13 Adam Barth 2011-03-31 20:08:01 PDT
This bug made me sad today.
Comment 14 Adam Barth 2011-03-31 20:08:44 PDT
Is attachment 87652 [details] meant to be marked obsolete, or does it need review?
Comment 15 Dirk Pranke 2011-03-31 20:22:53 PDT
Comment on attachment 87652 [details]
Patch

it needs review. R+ this and the 55191 and happiness will be yours (hopefully!)
Comment 16 Dirk Pranke 2011-03-31 20:23:44 PDT
(In reply to comment #15)
> (From update of attachment 87652 [details])
> it needs review. R+ this and the 55191 and happiness will be yours (hopefully!)

Or better, try the patches locally and let me know if you have any issues.
Comment 17 Dirk Pranke 2011-03-31 20:24:10 PDT
(In reply to comment #16)
> (In reply to comment #15)
> > (From update of attachment 87652 [details] [details])
> > it needs review. R+ this and the 55191 and happiness will be yours (hopefully!)
> 
> Or better, try the patches locally and let me know if you have any issues.

I haven't actually tried them with real-world rebaselines yet, but will be doing so tomorrow.
Comment 18 Adam Barth 2011-03-31 20:45:20 PDT
> it needs review. R+ this and the 55191 and happiness will be yours (hopefully!)

Oh how you lure me with your siren song.
Comment 19 Adam Barth 2011-03-31 20:47:23 PDT
Comment on attachment 87652 [details]
Patch

So magical.
Comment 20 Dirk Pranke 2011-04-01 12:15:45 PDT
Created attachment 87894 [details]
rebase to HEAD
Comment 21 Dirk Pranke 2011-04-01 12:18:55 PDT
Committed r82704: <http://trac.webkit.org/changeset/82704>
Comment 22 Tony Chang 2011-04-01 12:23:29 PDT
Comment on attachment 87894 [details]
rebase to HEAD

View in context: https://bugs.webkit.org/attachment.cgi?id=87894&action=review

> Tools/Scripts/webkitpy/layout_tests/rebaseline_chromium_webkit_tests_unittest.py:258
> +        #self.assertEqual(filesystem.files['/test.checkout/LayoutTests/platform/test/test_expectations.txt'], '')

Did you mean to check in with this line commented out?  Can we update it to the actual result?

> Tools/Scripts/webkitpy/layout_tests/rebaseline_chromium_webkit_tests_unittest.py:277
> +        #self.assertEqual(filesystem.files['/test.checkout/LayoutTests/platform/test/test_expectations.txt'], '')

This too.
Comment 23 Dirk Pranke 2011-04-01 12:30:09 PDT
(In reply to comment #22)
> (From update of attachment 87894 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=87894&action=review
> 
> > Tools/Scripts/webkitpy/layout_tests/rebaseline_chromium_webkit_tests_unittest.py:258
> > +        #self.assertEqual(filesystem.files['/test.checkout/LayoutTests/platform/test/test_expectations.txt'], '')
> 
> Did you mean to check in with this line commented out?  Can we update it to the actual result?
> 

Yes. In 55608 we stopped updating the test_expectations file altogether, but we resumed updating it in 55191. So I commented this out temporarily with the intent to uncomment in the next patch.

Then, when testing 55191, I realized that I had moved the updating of the test_expectations class out of Rebaseliner.run() altogether (it's now done in realMain), so this assert would never pass. So I deleted these lines completely prior to actually landing that patch.

The other unit tests do test that we update the file correctly, so I didn't think this particular assert was needed.