RESOLVED FIXED 55608
rebaseline-chromium-webkit-tests doesn't work correctly with version-specific baselines
https://bugs.webkit.org/show_bug.cgi?id=55608
Summary rebaseline-chromium-webkit-tests doesn't work correctly with version-specific...
Dirk Pranke
Reported 2011-03-02 13:26:40 PST
Right now RCWT will put all of the baselines in chromium-mac, chromium-win, or chromium-linux. It may slightly screw up any baselines for Vista and Win7 as well, and doesn't handle Snow Leopard or Linux-x86_64 at all. This should be fixed ASAP, otherwise not only can we not rebaseline the new bots, but even rebaselining the existing bots on Mac may be problematic due to the fallback path including a bunch of baselines in SL. I should have a patch for this within 24 hours.
Attachments
Patch (6.81 KB, patch)
2011-03-15 15:55 PDT, Dirk Pranke
no flags
Patch (8.72 KB, patch)
2011-03-15 16:10 PDT, Dirk Pranke
no flags
fix missing s (8.72 KB, patch)
2011-03-15 16:19 PDT, Dirk Pranke
no flags
make all_platforms actually be a list (8.70 KB, patch)
2011-03-15 16:29 PDT, Dirk Pranke
no flags
clean up mechanism for updating expectations file, make tests pass (28.65 KB, patch)
2011-03-29 21:05 PDT, Dirk Pranke
no flags
Patch (32.70 KB, patch)
2011-03-30 18:40 PDT, Dirk Pranke
no flags
ignore - wrong bug. (48.89 KB, patch)
2011-03-31 18:49 PDT, Dirk Pranke
no flags
rebase to HEAD (37.37 KB, patch)
2011-04-01 12:15 PDT, Dirk Pranke
no flags
Dirk Pranke
Comment 1 2011-03-15 15:55:08 PDT
Dirk Pranke
Comment 2 2011-03-15 16:10:29 PDT
Dirk Pranke
Comment 3 2011-03-15 16:19:57 PDT
Created attachment 85871 [details] fix missing s
Dirk Pranke
Comment 4 2011-03-15 16:29:35 PDT
Created attachment 85872 [details] make all_platforms actually be a list
Dirk Pranke
Comment 5 2011-03-29 21:05:01 PDT
Created attachment 87458 [details] clean up mechanism for updating expectations file, make tests pass
Dirk Pranke
Comment 6 2011-03-29 21:06:04 PDT
Okay, this patch is now in the neighborhood of being right, but it's not all the way there yet. It still needs some more tests and I need to think through the logic of deleting the lines marked "REBASELINE" from the file a little better.
Adam Barth
Comment 7 2011-03-29 23:06:36 PDT
I don't understand why I need to write "REBASELINE" in the file at all. Why can't it just look at the bots and give me a menu to choose from?
Ojan Vafai
Comment 8 2011-03-29 23:33:34 PDT
(In reply to comment #7) > I don't understand why I need to write "REBASELINE" in the file at all. Why can't it just look at the bots and give me a menu to choose from? The chromium bots run failing tests and mark them as failing in test_expectations.txt. So, there would be ~1000 tests to choose from. I agree we need a better syntax, but I think that's orthogonal to this patch. This patch is about making the current tool work right with fallback orders. Perhaps in a separate patch we could add a flag for whether to look at the failing bots or test_expectations.txt. The plan James and I had sketched out for "webkit-patch rebaseline" was that it would take either a list of tests as arguments or JSON if you wanted something more specific (e.g. rebaseline foo/bar.html on linux and foo/baz.html on win and linux). If no arguments are supplied, then it could look at the bots. Then we could have a GUI on the layout test dashboard that would generate this JSON and call out to a rebaseline bot, which would call webkit-patch rebaseline and post a patch to bugzilla.
Adam Barth
Comment 9 2011-03-29 23:47:55 PDT
> The chromium bots run failing tests and mark them as failing in test_expectations.txt. So, there would be ~1000 tests to choose from. I agree we need a better syntax, but I think that's orthogonal to this patch. This patch is about making the current tool work right with fallback orders. Yeah, of course. > Perhaps in a separate patch we could add a flag for whether to look at the failing bots or test_expectations.txt. Sure. I don't care about the 1000 tests that are in there. I care about the ones that are making the bots red right now. > The plan James and I had sketched out for "webkit-patch rebaseline" was that it would take either a list of tests as arguments or JSON if you wanted something more specific (e.g. rebaseline foo/bar.html on linux and foo/baz.html on win and linux). If no arguments are supplied, then it could look at the bots. Then we could have a GUI on the layout test dashboard that would generate this JSON and call out to a rebaseline bot, which would call webkit-patch rebaseline and post a patch to bugzilla. That could work. Ideally, I want a screen with all the diffs (like the flakiness dashboard) and a big button that says "these are all fine, please make this green with these expectations."
Dirk Pranke
Comment 10 2011-03-30 18:40:59 PDT
Dirk Pranke
Comment 11 2011-03-30 18:44:49 PDT
Okay, this patch should work "correctly" insofar as it will attempt to rebaseline all of the results on the canaries and update the baselines as specified in the file. There are two gotchas: 1) The tool will no longer delete anything from the test_expectations.txt file. 2) The tool will not yet run all of the possible test configurations; in particular, it won't rebaseline both the GPU- and non-GPU results at once, which it probably should do to be safe. The main problem is that without actually working through all of the different configurations, the tool can't easily know whether it is safe to delete lines in the file or not. The previous logic took a stab at it, but got it wrong at least as often at it got it right. These two gotchas should be fixed by the patch in bug 55191, which will depend on this fix and attempt to do a better, more comprehensive job. We likely will not want to land this patch without also landing that one, but I'm splitting them into two to make the patches easier to follow.
Dirk Pranke
Comment 12 2011-03-31 18:49:35 PDT
Created attachment 87814 [details] ignore - wrong bug.
Adam Barth
Comment 13 2011-03-31 20:08:01 PDT
This bug made me sad today.
Adam Barth
Comment 14 2011-03-31 20:08:44 PDT
Is attachment 87652 [details] meant to be marked obsolete, or does it need review?
Dirk Pranke
Comment 15 2011-03-31 20:22:53 PDT
Comment on attachment 87652 [details] Patch it needs review. R+ this and the 55191 and happiness will be yours (hopefully!)
Dirk Pranke
Comment 16 2011-03-31 20:23:44 PDT
(In reply to comment #15) > (From update of attachment 87652 [details]) > it needs review. R+ this and the 55191 and happiness will be yours (hopefully!) Or better, try the patches locally and let me know if you have any issues.
Dirk Pranke
Comment 17 2011-03-31 20:24:10 PDT
(In reply to comment #16) > (In reply to comment #15) > > (From update of attachment 87652 [details] [details]) > > it needs review. R+ this and the 55191 and happiness will be yours (hopefully!) > > Or better, try the patches locally and let me know if you have any issues. I haven't actually tried them with real-world rebaselines yet, but will be doing so tomorrow.
Adam Barth
Comment 18 2011-03-31 20:45:20 PDT
> it needs review. R+ this and the 55191 and happiness will be yours (hopefully!) Oh how you lure me with your siren song.
Adam Barth
Comment 19 2011-03-31 20:47:23 PDT
Comment on attachment 87652 [details] Patch So magical.
Dirk Pranke
Comment 20 2011-04-01 12:15:45 PDT
Created attachment 87894 [details] rebase to HEAD
Dirk Pranke
Comment 21 2011-04-01 12:18:55 PDT
Tony Chang
Comment 22 2011-04-01 12:23:29 PDT
Comment on attachment 87894 [details] rebase to HEAD View in context: https://bugs.webkit.org/attachment.cgi?id=87894&action=review > Tools/Scripts/webkitpy/layout_tests/rebaseline_chromium_webkit_tests_unittest.py:258 > + #self.assertEqual(filesystem.files['/test.checkout/LayoutTests/platform/test/test_expectations.txt'], '') Did you mean to check in with this line commented out? Can we update it to the actual result? > Tools/Scripts/webkitpy/layout_tests/rebaseline_chromium_webkit_tests_unittest.py:277 > + #self.assertEqual(filesystem.files['/test.checkout/LayoutTests/platform/test/test_expectations.txt'], '') This too.
Dirk Pranke
Comment 23 2011-04-01 12:30:09 PDT
(In reply to comment #22) > (From update of attachment 87894 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=87894&action=review > > > Tools/Scripts/webkitpy/layout_tests/rebaseline_chromium_webkit_tests_unittest.py:258 > > + #self.assertEqual(filesystem.files['/test.checkout/LayoutTests/platform/test/test_expectations.txt'], '') > > Did you mean to check in with this line commented out? Can we update it to the actual result? > Yes. In 55608 we stopped updating the test_expectations file altogether, but we resumed updating it in 55191. So I commented this out temporarily with the intent to uncomment in the next patch. Then, when testing 55191, I realized that I had moved the updating of the test_expectations class out of Rebaseliner.run() altogether (it's now done in realMain), so this assert would never pass. So I deleted these lines completely prior to actually landing that patch. The other unit tests do test that we update the file correctly, so I didn't think this particular assert was needed.
Note You need to log in before you can comment on or make changes to this bug.