Bug 72746

Summary: new baselines for crbug 104128
Product: WebKit Reporter: epoger
Component: Tools / TestsAssignee: Elliot Poger <epoger>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, adamk, dglazkov, epoger, eric, pkasting, reed, schenney, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
patch webkit.review.bot: commit-queue-

Description epoger 2011-11-18 12:01:11 PST
new baselines for crbug 104128
Comment 1 epoger 2011-11-18 12:47:18 PST
Created attachment 115849 [details]
patch
Comment 2 Peter Kasting 2011-11-18 12:54:35 PST
Patches that are just new baselines don't need review.  You can mark them "Unreviewed, gardening" or "Unreviewed, new baselines" or something similar and just commit.

You should probably comment, though, on how you made these, and how and why these differ from the checked-in baselines.
Comment 3 epoger 2011-11-18 13:16:13 PST
(In reply to comment #2)
> Patches that are just new baselines don't need review.  You can mark them "Unreviewed, gardening" or "Unreviewed, new baselines" or something similar and just commit.
> 
> You should probably comment, though, on how you made these, and how and why these differ from the checked-in baselines.

Hey, I was planning to add some comments over in http://crbug.com/104128 ('enable colorfilter optimization') but you beat me to the punch. :-)

I was planning on landing these changes manually (not via commit-queue) but I wanted you to be able to take a look first (given our conversation earlier).  So I figured this was a good way to do that.

I generated these by using "webkit-patch rebaseline-expectations" as documented at http://trac.webkit.org/wiki/Rebaseline .  I'm not sure why these differ from the checked-in baselines, but if that tool is working as advertised, this should allow us to remove those lines from test_expectations.txt.  The image diffs are negligible, and as expected. (We viewed before and after for all of them.)

Also note- in 20 minutes, I have to leave for the day.  Here's my proposal-please tell me what you think:
- on Monday morning, I will re-run "webkit-patch rebaseline-expectations" as I did to create this patch, in a fresh webkit checkout
- the results will probably be quite similar to what we see here
- I will work with a WebKit committer to land the change
- I will watch the webkit canaries like a hawk watches... canaries

Does that sound right?  (If you prefer, you can certainly land the patch as-is, but I won't be around to watch it.)
Comment 4 Peter Kasting 2011-11-18 18:36:16 PST
Adam and Eric, I'm CCing you here because I'm not sure how to best proceed and the question involves how webkit-patch works.

The baselines in the attached patch were generated with webkit-patch rebaseline-expectations run on a Mac 10.6 machine.  The expectations themselves -- I think -- were modified such that REBASELINE was added to all the lines that have been deleted in this patch, and the lines were not otherwise modified.  As you can see if you look, most of these lines were simply "BUGXYZ CPU : ...".

Now take for example the case of box-shadow-v.html.  The new baselines generated by the tool will, among other things, cause the Mac 10.6 CG bot, which is currently using a chromium-cg-mac baseline, to instead fall back to the platform/mac baseline (because it deletes the other one).  But those baselines currently differ, and the bot is currently passing, which means that these new baselines ought to cause that bot to start failing.  This is clearly undesirable.

What are we doing wrong here?  Is the script getting confused because the "CPU" expectation makes it not pull CG results, but then it tries to optimize all platforms based on the result subset it has?  Should we then make sure that we only run webkit-patch rebaseline-expectations on expectations that have REBASELINE with no other platform qualifiers?  And if so are we then expected to manually undo any "optimizations" to platforms we intended to leave in their current state (e.g. because they're failing and we don't want to paper over the failure)?

This week I was gardening with Steve Block in London and both of us had a lot of trouble with rebaselines generated by this tool causing other failures on checkin, so I'm wondering whether it's buggy or this is operator error.
Comment 5 Adam Barth 2011-11-18 18:41:32 PST
I'm sorry you had a bad experience with webkit-patch rebaseline-expectations.  There are a few options available:

1) We can try to debug the problem with rebaseline-expectations.  6:30pm on Friday is not an ideal time for this option, but we can go this route if you'd like.

2) We can use the old tool to see if it works correctly in this case.

3) We can land these rebaselines and then use garden-o-matic to rebaseline any failures (presumably back to their previous values).  Garden-o-matic is supposed to converge to the correct results, but possibly not always on the first try (there's a trade-off here when dealing with slow bots).

4) We can move the expected results around manually and investigate / correct the issue on Monday.
Comment 6 WebKit Review Bot 2011-11-18 19:05:16 PST
Comment on attachment 115849 [details]
patch

Attachment 115849 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/10519294

New failing tests:
fast/canvas/shadow-offset-7.html
fast/canvas/shadow-offset-6.html
fast/canvas/shadow-offset-4.html
fast/repaint/box-shadow-h.html
fast/replaced/width100percent-textarea.html
fast/box-shadow/inset-box-shadow-radius.html
fast/repaint/moving-shadow-on-container.html
fast/box-shadow/box-shadow-radius.html
fast/box-shadow/shadow-buffer-partial.html
fast/canvas/shadow-offset-5.html
fast/repaint/text-shadow.html
fast/repaint/box-shadow-v.html
fast/repaint/text-shadow-horizontal.html
Comment 7 epoger 2011-11-20 23:30:32 PST
(In reply to comment #4)

> This week I was gardening with Steve Block in London and both of us had a lot of trouble with rebaselines generated by this tool causing other failures on checkin, so I'm wondering whether it's buggy or this is operator error.

See https://bugs.webkit.org/show_bug.cgi?id=72860 ('proposal: separate baseline optimization from new baseline generation')
Comment 8 epoger 2011-11-22 13:13:58 PST
update: this is still blocked by apparent disagreement between various rebaselining tools, run by different people on different platforms.

I am going to try to make "webkit-patch rebaseline-expectations" just grab new baselines without optimization and see how that goes.  But it may be next week before I get to that.

In the meanwhile, are these test_expectations causing serious problems for anyone?
Comment 9 Adam Klein 2011-11-22 13:17:34 PST
(In reply to comment #8)
> update: this is still blocked by apparent disagreement between various rebaselining tools, run by different people on different platforms.
> 
> I am going to try to make "webkit-patch rebaseline-expectations" just grab new baselines without optimization and see how that goes.  But it may be next week before I get to that.

Sounds good, thanks.

> In the meanwhile, are these test_expectations causing serious problems for anyone?

Not "serious" problems, but they do add noise to the already-very-noisy gardening tools (e.g., on Linux the tests are passing, so they show "unexpectedly passed"; that could probably be avoided by further tinkering with the expectations instead of rebaselining but it's hard to tell that without the full context of this situation).
Comment 10 Stephen Chenney 2012-04-17 10:25:10 PDT
Committed r114393: <http://trac.webkit.org/changeset/114393>
Comment 11 Peter Kasting 2012-07-31 21:17:36 PDT
All tests referenced in the patch here have now been rebaselined or are covered by other bugs.  Last new baselines were landed in r124296.