new baselines for crbug 104128
Created attachment 115849 [details]
Patches that are just new baselines don't need review. You can mark them "Unreviewed, gardening" or "Unreviewed, new baselines" or something similar and just commit.
You should probably comment, though, on how you made these, and how and why these differ from the checked-in baselines.
(In reply to comment #2)
> Patches that are just new baselines don't need review. You can mark them "Unreviewed, gardening" or "Unreviewed, new baselines" or something similar and just commit.
> You should probably comment, though, on how you made these, and how and why these differ from the checked-in baselines.
Hey, I was planning to add some comments over in http://crbug.com/104128 ('enable colorfilter optimization') but you beat me to the punch. :-)
I was planning on landing these changes manually (not via commit-queue) but I wanted you to be able to take a look first (given our conversation earlier). So I figured this was a good way to do that.
I generated these by using "webkit-patch rebaseline-expectations" as documented at http://trac.webkit.org/wiki/Rebaseline . I'm not sure why these differ from the checked-in baselines, but if that tool is working as advertised, this should allow us to remove those lines from test_expectations.txt. The image diffs are negligible, and as expected. (We viewed before and after for all of them.)
Also note- in 20 minutes, I have to leave for the day. Here's my proposal-please tell me what you think:
- on Monday morning, I will re-run "webkit-patch rebaseline-expectations" as I did to create this patch, in a fresh webkit checkout
- the results will probably be quite similar to what we see here
- I will work with a WebKit committer to land the change
- I will watch the webkit canaries like a hawk watches... canaries
Does that sound right? (If you prefer, you can certainly land the patch as-is, but I won't be around to watch it.)
Adam and Eric, I'm CCing you here because I'm not sure how to best proceed and the question involves how webkit-patch works.
The baselines in the attached patch were generated with webkit-patch rebaseline-expectations run on a Mac 10.6 machine. The expectations themselves -- I think -- were modified such that REBASELINE was added to all the lines that have been deleted in this patch, and the lines were not otherwise modified. As you can see if you look, most of these lines were simply "BUGXYZ CPU : ...".
Now take for example the case of box-shadow-v.html. The new baselines generated by the tool will, among other things, cause the Mac 10.6 CG bot, which is currently using a chromium-cg-mac baseline, to instead fall back to the platform/mac baseline (because it deletes the other one). But those baselines currently differ, and the bot is currently passing, which means that these new baselines ought to cause that bot to start failing. This is clearly undesirable.
What are we doing wrong here? Is the script getting confused because the "CPU" expectation makes it not pull CG results, but then it tries to optimize all platforms based on the result subset it has? Should we then make sure that we only run webkit-patch rebaseline-expectations on expectations that have REBASELINE with no other platform qualifiers? And if so are we then expected to manually undo any "optimizations" to platforms we intended to leave in their current state (e.g. because they're failing and we don't want to paper over the failure)?
This week I was gardening with Steve Block in London and both of us had a lot of trouble with rebaselines generated by this tool causing other failures on checkin, so I'm wondering whether it's buggy or this is operator error.
I'm sorry you had a bad experience with webkit-patch rebaseline-expectations. There are a few options available:
1) We can try to debug the problem with rebaseline-expectations. 6:30pm on Friday is not an ideal time for this option, but we can go this route if you'd like.
2) We can use the old tool to see if it works correctly in this case.
3) We can land these rebaselines and then use garden-o-matic to rebaseline any failures (presumably back to their previous values). Garden-o-matic is supposed to converge to the correct results, but possibly not always on the first try (there's a trade-off here when dealing with slow bots).
4) We can move the expected results around manually and investigate / correct the issue on Monday.
Comment on attachment 115849 [details]
Attachment 115849 [details] did not pass chromium-ews (chromium-xvfb):
New failing tests:
(In reply to comment #4)
> This week I was gardening with Steve Block in London and both of us had a lot of trouble with rebaselines generated by this tool causing other failures on checkin, so I'm wondering whether it's buggy or this is operator error.
See https://bugs.webkit.org/show_bug.cgi?id=72860 ('proposal: separate baseline optimization from new baseline generation')
update: this is still blocked by apparent disagreement between various rebaselining tools, run by different people on different platforms.
I am going to try to make "webkit-patch rebaseline-expectations" just grab new baselines without optimization and see how that goes. But it may be next week before I get to that.
In the meanwhile, are these test_expectations causing serious problems for anyone?
(In reply to comment #8)
> update: this is still blocked by apparent disagreement between various rebaselining tools, run by different people on different platforms.
> I am going to try to make "webkit-patch rebaseline-expectations" just grab new baselines without optimization and see how that goes. But it may be next week before I get to that.
Sounds good, thanks.
> In the meanwhile, are these test_expectations causing serious problems for anyone?
Not "serious" problems, but they do add noise to the already-very-noisy gardening tools (e.g., on Linux the tests are passing, so they show "unexpectedly passed"; that could probably be avoided by further tinkering with the expectations instead of rebaselining but it's hard to tell that without the full context of this situation).
Committed r114393: <http://trac.webkit.org/changeset/114393>
All tests referenced in the patch here have now been rebaselined or are covered by other bugs. Last new baselines were landed in r124296.