Bug 84762 - webkit-patch rebaseline-test is retrieving stale expectations
Summary: webkit-patch rebaseline-test is retrieving stale expectations
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Dirk Pranke
URL:
Keywords:
Depends on: 86242
Blocks: 86171
  Show dependency treegraph
 
Reported: 2012-04-24 14:04 PDT by Dirk Pranke
Modified: 2012-05-18 15:09 PDT (History)
5 users (show)

See Also:


Attachments
Patch (5.83 KB, patch)
2012-04-24 14:57 PDT, Dirk Pranke
no flags Details | Formatted Diff | Diff
Now with updated tests :) (15.62 KB, patch)
2012-04-24 16:28 PDT, Dirk Pranke
no flags Details | Formatted Diff | Diff
update w/ more review feedback - patch for landing (15.44 KB, patch)
2012-04-24 17:04 PDT, Dirk Pranke
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Dirk Pranke 2012-04-24 14:04:56 PDT
webkit-patch rebaseline-test is fetching individual files from the layout_tests archive directories, rather than fetching them from the most recent zipfile. Since we don't empty out / clobber the layout_test_results directory before extracting the build into it, it's easy for there to be stale results in the directory (in fact, the flakiness dashboard leverages this to show you the "last failure on the bot").

There are probably three different ways to fix this:

The easiest and probably least invasive is to fetch the zip file and then extract the member instead of fetching the member. This is what rebaseline-chromium-webkit-tests used to do, and it's guaranteed to be correct. It's also faster if you realize that you can cache the zip file locally and are rebaselining a lot of files at once (although adding caching adds complexity, obviously).

The second would be to clobber the results directory - this would break the "feature" of the flakiness dashboard, which can be confusing but is also useful.

The third would be to modify the tools to be smart enough to realize that the test was only failing with (say) an IMAGE failure and to then ignore the .txt file.
Comment 1 Ojan Vafai 2012-04-24 14:11:31 PDT
As I said offline, my preference would be to do the zip-file + caching approach. In addition to fixing this bug, it would also make bulk rebaselines many orders of magnitude faster. When I was rebaselining for the Chromium Lion port, I would queue up a couple hundred rebaselines, then wait 30-60 minutes for garden-o-matic to catch up.

That said, I think we should also modify garden-o-matic to only try to rebaseline the expectations type that is failing (e.g. only the png if it's a IMAGE failure), but that's a separate bug. Notably, I think we want UI in garden-o-matic to just rebaseline one of the txt or the png for cases where one of them is correct.
Comment 2 Dirk Pranke 2012-04-24 14:57:16 PDT
Created attachment 138655 [details]
Patch
Comment 3 Dirk Pranke 2012-04-24 14:58:39 PDT
Patch uploaded, but this is intended solely as a workaround until abarth can chime in on this.
Comment 4 Ojan Vafai 2012-04-24 15:40:35 PDT
Comment on attachment 138655 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=138655&action=review

> Tools/Scripts/webkitpy/tool/commands/rebaseline.py:143
> +        # correct results. Also, this fetch_from_zip test is really sleazy because it
> +        # is bypassing all of the unit tests ...
> +        fetch_from_zip = self._tool.__class__.__name__ != 'MockTool'
> +        if fetch_from_zip:

Can't we just update the tests with whatever the new output is? I don't think we should be checking things like this in even temporarily. Some percentage of code rot like this tends to stick around a lot longer than expected.
Comment 5 Dirk Pranke 2012-04-24 15:52:59 PDT
(In reply to comment #4)
> Can't we just update the tests with whatever the new output is? I don't think we should be checking things like this in even temporarily.

Bah :)
Comment 6 Dirk Pranke 2012-04-24 16:28:09 PDT
Created attachment 138680 [details]
Now with updated tests :)
Comment 7 Dirk Pranke 2012-04-24 16:28:44 PDT
Comment on attachment 138680 [details]
Now with updated tests :)

Ojan is a harsh task master.
Comment 8 Ojan Vafai 2012-04-24 16:38:29 PDT
Comment on attachment 138680 [details]
Now with updated tests :)

View in context: https://bugs.webkit.org/attachment.cgi?id=138680&action=review

Thanks for fixing up the tests.

> Tools/Scripts/webkitpy/tool/commands/rebaseline.py:144
> +        # correct results. Also, this fetch_from_zip test is really sleazy because it
> +        # is bypassing all of the unit tests ...

Nit: comment outdated.

> Tools/Scripts/webkitpy/tool/commands/rebaseline.py:146
> +        zip_url = self._results_url(builder_name).replace('results/layout-test-results', 'layout-test-results.zip')

Now that we don't have the else clause, this is the only use of this method. Maybe make that method do the right thing? If you hvae it call results_url instead of accumulated_results_url and then append layout-test-results.zip, I think it'll do what you want.

> Tools/Scripts/webkitpy/tool/commands/rebaseline.py:152
> +            print "  Found " + member_name

In the spirit of keeping the log output minimal, I'm not sure this is really useful. If the rebaseline continues without error, we can assume it's found. The error-case below is good though.
Comment 9 Dirk Pranke 2012-04-24 16:51:55 PDT
(In reply to comment #8)
> (From update of attachment 138680 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=138680&action=review
> 
> Thanks for fixing up the tests.
> 
> > Tools/Scripts/webkitpy/tool/commands/rebaseline.py:144
> > +        # correct results. Also, this fetch_from_zip test is really sleazy because it
> > +        # is bypassing all of the unit tests ...
> 
> Nit: comment outdated.
> 

Good catch.

> > Tools/Scripts/webkitpy/tool/commands/rebaseline.py:146
> > +        zip_url = self._results_url(builder_name).replace('results/layout-test-results', 'layout-test-results.zip')
> 
> Now that we don't have the else clause, this is the only use of this method. Maybe make that method do the right thing? If you hvae it call results_url instead of accumulated_results_url and then append layout-test-results.zip, I think it'll do what you want.

Done.

> 
> > Tools/Scripts/webkitpy/tool/commands/rebaseline.py:152
> > +            print "  Found " + member_name
> 
> In the spirit of keeping the log output minimal, I'm not sure this is really useful. If the rebaseline continues without error, we can assume it's found. The error-case below is good though.

The error case is not really an error case (since we look for a 'wav' file for every test). If you want to reduce what we log, I think it would be better to only log the ones we do find. (Will be less verbose for text-only tests, at least).
Comment 10 Ojan Vafai 2012-04-24 16:59:09 PDT
Comment on attachment 138680 [details]
Now with updated tests :)

View in context: https://bugs.webkit.org/attachment.cgi?id=138680&action=review

>>> Tools/Scripts/webkitpy/tool/commands/rebaseline.py:152
>>> +            print "  Found " + member_name
>> 
>> In the spirit of keeping the log output minimal, I'm not sure this is really useful. If the rebaseline continues without error, we can assume it's found. The error-case below is good though.
> 
> The error case is not really an error case (since we look for a 'wav' file for every test). If you want to reduce what we log, I think it would be better to only log the ones we do find. (Will be less verbose for text-only tests, at least).

oic. I guess this is better output than the old output. I'm fine with it as is. We can always tweak it later if we think it's too verbose.
Comment 11 Dirk Pranke 2012-04-24 17:01:57 PDT
(In reply to comment #9)
> > > Tools/Scripts/webkitpy/tool/commands/rebaseline.py:146
> > > +        zip_url = self._results_url(builder_name).replace('results/layout-test-results', 'layout-test-results.zip')
> > 
> > Now that we don't have the else clause, this is the only use of this method. Maybe make that method do the right thing? If you hvae it call results_url instead of accumulated_results_url and then append layout-test-results.zip, I think it'll do what you want.
>

Actually, it looks like results_url() doesn't return what you would want (it omits the "f" directive and points you to a "results" dir).
Comment 12 Dirk Pranke 2012-04-24 17:04:05 PDT
Created attachment 138694 [details]
update w/ more review feedback - patch for landing
Comment 13 Dirk Pranke 2012-04-24 17:11:04 PDT
Committed r115142: <http://trac.webkit.org/changeset/115142>
Comment 14 Eric Seidel (no email) 2012-05-10 22:59:09 PDT
I think the caching mechanism either doesn't work, or the idea of having to ever download a 34mb zip file in order to rebasline is broken. :)  Or possibly both. :)
Comment 15 Ojan Vafai 2012-05-11 00:07:07 PDT
(In reply to comment #14)
> I think the caching mechanism either doesn't work, or the idea of having to ever download a 34mb zip file in order to rebasline is broken. :)  Or possibly both. :)

Well there is no caching mechanism yet. The hypothetical caching mechanism could make this a lot faster though. :)

It is a problem though. rebaselines are really slow with these larger zip files. IMO, slow+correct is better than fast+mostly correct. But, it'd be good to get the caching in soon.
Comment 16 Ojan Vafai 2012-05-11 00:08:10 PDT
(In reply to comment #15)
> (In reply to comment #14)
> > I think the caching mechanism either doesn't work, or the idea of having to ever download a 34mb zip file in order to rebasline is broken. :)  Or possibly both. :)
> 
> Well there is no caching mechanism yet. The hypothetical caching mechanism could make this a lot faster though. :)
> 
> It is a problem though. rebaselines are really slow with these larger zip files. IMO, slow+correct is better than fast+mostly correct. But, it'd be good to get the caching in soon.

Also, theoretically, once we have the caching, rebaselines will be much faster than they were in the old scheme, not just faster than the current slowness.
Comment 17 Eric Seidel (no email) 2012-05-11 00:15:20 PDT
I suspect that the sydney office may beg to differ with your assessment that the curent situation is better than 2 weeks ago. :)
Comment 18 Adam Barth 2012-05-11 00:37:44 PDT
@noel Perhaps this is what caused Bug 86171
Comment 19 noel gordon 2012-05-11 02:21:35 PDT
Yes I believe so.
Comment 20 Dirk Pranke 2012-05-11 10:20:38 PDT
Sounds like we should either revert this, or add a flag to get the old behavior back while we're waiting for a better solution.
Comment 21 Adam Barth 2012-05-11 10:27:02 PDT
I'd be inclined to roll back the change given the performance impact.  We might want to try a different approach for solving this problem, for example by having garden-o-matic indiciate which results it wants the script to download.
Comment 22 Dirk Pranke 2012-05-11 10:36:08 PDT
Okay, I will roll it out until I can come up with a better solution.
Comment 23 Dirk Pranke 2012-05-11 10:42:36 PDT
Rollout posted in bug 86231 .
Comment 24 Dirk Pranke 2012-05-11 12:40:03 PDT
Okay, i've posted a patch for garden-o-matic to tell webkit-patch which types of results to rebaseline in bug 86242 ...
Comment 25 noel gordon 2012-05-12 03:23:55 PDT
Question: would it be possible to extract needed files from the .zip on the server-side?
Comment 26 Dirk Pranke 2012-05-13 09:41:28 PDT
(In reply to comment #25)
> Question: would it be possible to extract needed files from the .zip on the server-side?

Not the way things are currently set up; the server is a dumb file server. However, bug 86242 should fix things.
Comment 27 Dirk Pranke 2012-05-18 15:09:08 PDT
closing ... this should be fixed now.