Bug 88561 - "webkit-patch rebaseline-expectations" added bogus TEXT changes when rebaselining IMAGE failures
Summary: "webkit-patch rebaseline-expectations" added bogus TEXT changes when rebaseli...
Status: RESOLVED DUPLICATE of bug 88581
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: WebKit Review Bot
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-06-07 11:32 PDT by WebKit Review Bot
Modified: 2012-06-11 07:11 PDT (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description WebKit Review Bot 2012-06-07 11:32:44 PDT
http://trac.webkit.org/changeset/119737 broke the build:
This patch broke a bunch of mac bots (Requested by epoger on #webkit).

This is an automatic bug report generated by the sheriff-bot. If this bug
report was created because of a flaky test, please file a bug for the flaky
test (if we don't already have one on file) and dup this bug against that bug
so that we can track how often these flaky tests case pain.

"Only you can prevent forest fires." -- Smokey the Bear
Comment 1 Elliot Poger 2012-06-07 11:41:09 PDT
Here are the failures from http://build.chromium.org/p/chromium.webkit/builders/Webkit%20Mac10.6/builds/17559/steps/webkit_tests/logs/stdio :

Regressions: Unexpected text diff mismatch : (18)
  css1/basic/containment.html = TEXT
  css1/basic/id_as_selector.html = TEXT
  css1/box_properties/border_bottom.html = TEXT
  css1/box_properties/border_left.html = TEXT
  css1/box_properties/border_right_inline.html = TEXT
  css1/box_properties/border_top.html = TEXT
  css1/box_properties/clear_float.html = TEXT
  css1/box_properties/margin_left.html = TEXT
  css1/box_properties/margin_right.html = TEXT
  css1/box_properties/padding_left.html = TEXT
  css1/box_properties/padding_right.html = TEXT
  css1/cascade/cascade_order.html = TEXT
  css1/classification/list_style_type.html = TEXT
  css1/pseudo/anchor.html = TEXT
  css2.1/20110323/replaced-intrinsic-ratio-001.htm = TEXT
  editing/deleting/delete-after-span-ws-001.html = TEXT
  editing/deleting/delete-after-span-ws-002.html = TEXT
  editing/deleting/delete-after-span-ws-003.html = TEXT
Comment 2 Elliot Poger 2012-06-07 11:44:40 PDT
I don't know how my change caused those failures.  I created my CL using "webkit-patch rebaseline-expectations", as documented in http://code.google.com/p/skia/wiki/LandAndRebaselineLayoutTests

Why would webkit-patch rebaseline-expectations create text files that fail?
Comment 3 Elliot Poger 2012-06-07 12:28:24 PDT
Committed r119746: <http://trac.webkit.org/changeset/119746>
Comment 4 Elliot Poger 2012-06-07 12:41:43 PDT
I created http://trac.webkit.org/changeset/119746 , shown above, using:
Tools/Scripts/webkit-patch rollout 119737 "see https://bugs.webkit.org/show_bug.cgi?id=88561"

So that will hopefully revert http://trac.webkit.org/changeset/119737 and fix the broken waterfall bots.  (I'll check on them in a bit)

But the bug is still open... why did "webkit-patch rebaseline-expectations" create incorrect results?
Comment 5 Dirk Pranke 2012-06-07 12:46:44 PDT
what did it do?
Comment 6 Ryosuke Niwa 2012-06-07 12:50:39 PDT
(In reply to comment #4)
> I created http://trac.webkit.org/changeset/119746 , shown above, using:
> Tools/Scripts/webkit-patch rollout 119737 "see https://bugs.webkit.org/show_bug.cgi?id=88561"
> 
> So that will hopefully revert http://trac.webkit.org/changeset/119737 and fix the broken waterfall bots.  (I'll check on them in a bit)
> 
> But the bug is still open... why did "webkit-patch rebaseline-expectations" create incorrect results?

Did you happen to use Mac port's build to run this command? (as supposed to Chromium Mac build).
Comment 7 Elliot Poger 2012-06-07 12:53:22 PDT
The bot is green again as of the rollout... http://build.chromium.org/p/chromium.webkit/builders/Webkit%20Mac10.6/builds/17564
Comment 8 Elliot Poger 2012-06-07 13:01:48 PDT
Let's focus on one of the tests that failed:
  css1/basic/containment.html = TEXT

How I rebaselined:
1. I updated a clean webkit tree.
2. I marked 100 lines in that tree's skia_test_expectations file with REBASELINE, including the css1/basic/containment line (line 73 of http://src.chromium.org/viewvc/chrome/trunk/src/skia/skia_test_expectations.txt?annotate=140775 )
3. I ran "Tools/Scripts/webkit-patch rebaseline-expectations", and committed its results as http://trac.webkit.org/changeset/119737

If you look at http://trac.webkit.org/changeset/119737 , you'll see that several css1/basic/containment text files were modified.  I was surprised by that, since the change I was rebaselining for should only affect minute pixel values, not actual layout results… but I figured the tool was just optimizing across platforms or something like that.
Comment 9 Ryosuke Niwa 2012-06-07 13:05:03 PDT
Since you didn't answer my question, I'll tell you that if you use rebaseline-expectations with non-Chromium port build, then it'll mess up expected results because ImageDiff in non-Chromium ports have non-zero tolerance level even if you specified the tolerance to be 0.
Comment 10 Dirk Pranke 2012-06-07 13:07:38 PDT
(In reply to comment #9)
> Since you didn't answer my question, I'll tell you that if you use rebaseline-expectations with non-Chromium port build, then it'll mess up expected results because ImageDiff in non-Chromium ports have non-zero tolerance level even if you specified the tolerance to be 0.

The code is hard-wired to skip non-chromium ports.
Comment 11 Ryosuke Niwa 2012-06-07 13:08:33 PDT
(In reply to comment #10)
> The code is hard-wired to skip non-chromium ports.

It is strange that it modified non-chromium port expected results then.
Comment 12 Elliot Poger 2012-06-07 13:14:01 PDT
(In reply to comment #9)
> Since you didn't answer my question, I'll tell you that if you use rebaseline-expectations with non-Chromium port build, then it'll mess up expected results because ImageDiff in non-Chromium ports have non-zero tolerance level even if you specified the tolerance to be 0.

I don't understand your question.  This is *exactly* what I ran on the command line:

Tools/Scripts/webkit-patch rebaseline-expectations

What does it mean to "use Mac port's build to run this command"?
Comment 13 Ryosuke Niwa 2012-06-07 13:15:33 PDT
(In reply to comment #12)
> What does it mean to "use Mac port's build to run this command"?

build-webkit instead of build-webkit --chromium.
Comment 14 Dirk Pranke 2012-06-07 13:17:22 PDT
I'm still trying to understand what happened. 

1) it looks like rebaseline-expectations incorrectly ignores the failure type in the file and will always attempt to rebaseline all of the types of expected results.

2) that said, if the chromium bots were failing with IMAGE results, I wouldn't have thought there would be .txt files to pull (have to double check this)

3) it looks like maybe optimize-baselines ran across all of the ports but skipped the mac (and maybe qt?) port somehow, and concluded that all of the other ports were matching on the text files, so they deleted the efl and gtk baselines (and the chromium baselines) and promoted them up to generic, thus causing mac to break. I don't yet know why any of this would've happened.
Comment 15 Dirk Pranke 2012-06-07 13:21:00 PDT
(In reply to comment #9)
> Since you didn't answer my question, I'll tell you that if you use rebaseline-expectations with non-Chromium port build, then it'll mess up expected results because ImageDiff in non-Chromium ports have non-zero tolerance level even if you specified the tolerance to be 0.

Also, we don't run ImageDiff as part of rebaselining. The files are de-dup'ed based on their hashes.
Comment 16 Ryosuke Niwa 2012-06-07 13:21:53 PDT
(In reply to comment #15)
> (In reply to comment #9)
> > Since you didn't answer my question, I'll tell you that if you use rebaseline-expectations with non-Chromium port build, then it'll mess up expected results because ImageDiff in non-Chromium ports have non-zero tolerance level even if you specified the tolerance to be 0.
> 
> Also, we don't run ImageDiff as part of rebaselining. The files are de-dup'ed based on their hashes.

I think optimize-rebaseline does.
Comment 17 Dirk Pranke 2012-06-07 13:34:04 PDT
(In reply to comment #16)
> > Also, we don't run ImageDiff as part of rebaselining. The files are de-dup'ed based on their hashes.
> 
> I think optimize-rebaseline does.

No, it doesn't. The only de-duping we do is during optimize-rebaseline, and that is what I was referring to.
Comment 18 Ryosuke Niwa 2012-06-07 13:36:18 PDT
(In reply to comment #17)
> (In reply to comment #16)
> > > Also, we don't run ImageDiff as part of rebaselining. The files are de-dup'ed based on their hashes.
> > 
> > I think optimize-rebaseline does.
> 
> No, it doesn't. The only de-duping we do is during optimize-rebaseline, and that is what I was referring to.

That is odd. I encountered the exactly same problem when I was using rebaseline-expectations several months ago. Maybe things have changed.
Comment 19 Ojan Vafai 2012-06-07 14:25:21 PDT
(In reply to comment #14)
> 1) it looks like rebaseline-expectations incorrectly ignores the failure type in the file and will always attempt to rebaseline all of the types of expected results.

This is def incorrect behavior, but I agree that this is what it's doing. :)

> 2) that said, if the chromium bots were failing with IMAGE results, I wouldn't have thought there would be .txt files to pull (have to double check this)

The chromium bots keep the aggregate amount of failures. So if this test ever failed text on that bot, it'll pull that result. So, it definitely can do the wrong thing here.

We should fix (1) and try the rebaseline steps again and see if that fixes it.
Comment 20 Dirk Pranke 2012-06-07 14:36:51 PDT
(In reply to comment #19)
> > 2) that said, if the chromium bots were failing with IMAGE results, I wouldn't have thought there would be .txt files to pull (have to double check this)
> 
> The chromium bots keep the aggregate amount of failures. So if this test ever failed text on that bot, it'll pull that result. So, it definitely can do the wrong thing here.
> 
> We should fix (1) and try the rebaseline steps again and see if that fixes it.

Ick, you're right, I forgot about that. It should be easy to fix (1) ...
Comment 21 Dirk Pranke 2012-06-07 14:37:20 PDT
still have no explanation for the optimize part, and that's in many ways more disturbing, though.
Comment 22 Ojan Vafai 2012-06-07 14:42:16 PDT
(In reply to comment #21)
> still have no explanation for the optimize part, and that's in many ways more disturbing, though.

Should be easy to reproduce:
1. Run webkit-patch rebaseline-test for each canary bot.
2. Look at the state of the world here to get a look for what the optimized baselines should look like.
3. Run webkit-patch optimize-baselines and see it gives you the right result.

Without knowing the state in step 2, it's hard to say if there's a bug.

I think this might just be miscommunication though. Did this cause failures on the Apple mac port as well? I think it only caused failures on the Chromium mac port, which would makes sense given the other known bug.
Comment 23 Dirk Pranke 2012-06-07 14:44:05 PDT
(In reply to comment #22)
> (In reply to comment #21)
> > still have no explanation for the optimize part, and that's in many ways more disturbing, though.
> 
> Should be easy to reproduce:
> 1. Run webkit-patch rebaseline-test for each canary bot.
> 2. Look at the state of the world here to get a look for what the optimized baselines should look like.
> 3. Run webkit-patch optimize-baselines and see it gives you the right result.
> 
> Without knowing the state in step 2, it's hard to say if there's a bug.
> 
> I think this might just be miscommunication though. Did this cause failures on the Apple mac port as well? I think it only caused failures on the Chromium mac port, which would makes sense given the other known bug.

If you look at the files affected in the CL, we're modifying baselines in the generic directory and in the gtk and efl directories, none of which I would expect.
Comment 24 Elliot Poger 2012-06-07 15:07:05 PDT
Until we figure out all this stuff, I'm going to roll out my original change ( https://src.chromium.org/viewvc/chrome?view=rev&revision=140760 ) in as orderly a fashion as possible... the longer we leave 1800 test suppressions in place, the more new failures will be obscured.

Updates on this "retreat" will be in http://crbug.com/131189 .
Comment 25 Dirk Pranke 2012-06-07 15:11:32 PDT
I am working on making rebaseline-expectations only pull the right suffixes in bug 88581.
Comment 26 Ojan Vafai 2012-06-07 15:29:11 PDT
(In reply to comment #23)
> If you look at the files affected in the CL, we're modifying baselines in the generic directory and in the gtk and efl directories, none of which I would expect.

No, that is expected behavior. optimize-baselines optimizes globally. There's no reason for it to restrict optimizations to one port. Nearly everytime I've done rebaselines it has correctly modified the gtk/efl directories to remove unneeded duplicates.
Comment 27 Dirk Pranke 2012-06-07 15:33:24 PDT
(In reply to comment #26)
> (In reply to comment #23)
> > If you look at the files affected in the CL, we're modifying baselines in the generic directory and in the gtk and efl directories, none of which I would expect.
> 
> No, that is expected behavior. optimize-baselines optimizes globally. There's no reason for it to restrict optimizations to one port. Nearly everytime I've done rebaselines it has correctly modified the gtk/efl directories to remove unneeded duplicates.

Okay, I was still surprised by the generic dir mods, but I'm happy to have epoger try again after the other fix lands.
Comment 28 Ryosuke Niwa 2012-06-08 18:36:39 PDT
Is this bug still open? Or should we just close it?
Comment 29 Elliot Poger 2012-06-11 07:11:12 PDT
[old title: 'REGRESSION(r119737): This patch broke a bunch of mac bots (Requested by epoger on #webkit).']

Renaming to reflect the core problem we encountered here, which has supposedly been fixed in https://bugs.webkit.org/show_bug.cgi?id=88581 .

*** This bug has been marked as a duplicate of bug 88581 ***