97045 – webkit-patch rebaseline does the wrong thing

NEW 97045

webkit-patch rebaseline does the wrong thing

https://bugs.webkit.org/show_bug.cgi?id=97045

Summary webkit-patch rebaseline does the wrong thing

Ojan Vafai

Reported 2012-09-18 14:36:54 PDT

Rebaselined windows only failures with http://trac.webkit.org/changeset/128912, but that caused these tests to start failing on Linux since Chromium Linux falls back to Chromium Windows and there wasn't an existing Chromium Linux specific result. Submitted http://trac.webkit.org/changeset/128931 to fix. When we rebaseline a test, we need to rebaseline all the ports that implicitly are changing as well to make sure we don't cause new failures. In this case, we should notice that Chromium Linux is passing the tests and rebaseline both Chromium Linux and Chromium Win.

Attachments
Add attachment proposed patch, testcase, etc.

Ojan Vafai

Comment 1 2012-09-18 14:37:56 PDT

Really this is a bug with webkit-patch rebaseline*, not garden-o-matic.

Dirk Pranke

Comment 2 2012-09-18 14:42:13 PDT

hm. I thought we had code that checked that.

Ojan Vafai

Comment 3 2012-09-18 14:48:08 PDT

We have code that lets you hardcode "platform_move_to" in builders.py to do this. But it doesn't automatically figure out to do it from the hypergraph. I think we just need to change that logic to do it automatically based off the hypergraph data.

Dirk Pranke

Comment 4 2012-09-18 14:55:28 PDT

Turns out I'm thinking of the code in the baseline optimizer that checks to make sure optimizing doesn't change any results. You're right, we don't have any code that checks if other ports might need to be rebaselined if we change a baseline for one port.

Adam Barth

Comment 5 2012-09-19 11:20:27 PDT

I knew about this issue when I designed the algorithm, but it's not clear to me how to fix it. The problem is that you often want to change the results for the other ports. Consider the case where revision N changes the results of test X and previously all the Windows versions had the same results for test X. For whatever reason, the WinXP bot hasn't processed revision N yet, so the bot still sees old result. The right solution here is to guess that the result is going to stay the same across Windows versions and to overwrite the chromium-win results with the results from the Win7 bot. If later we discover that WinXP has a different result, we can then recode that in the chromium-win-xp directory. The algorithm is designed to be eventually correct. If you keep rebaselining, you'll eventually get to the right state. You just might not get there in one step. t's impossible to always know what the correct final configuration is, so we have some basic heuristics in place. It's very likely there are ways to improve the heuristics.

Ojan Vafai

Comment 6 2012-09-19 12:16:55 PDT

The issue with the current heuristics is that it breaks even if all the bots have run the tests. Maybe we should actually use different heuristics for the unexpected failures tab vs the expected failures tab. In the former, the common case is that not all the bots have run. In the latter it's the opposite. We could make rebaselines from the expected failures tab always do the 100% right thing and the unexpected failures tab do the right thing most of the time (i.e. what it does now). WDYT?

Dirk Pranke

Comment 7 2012-09-19 12:29:48 PDT

I think there's also potentially the problem that we're not giving enough information / direction to the rebaselining, e.g., it can be unclear whether we mean "change the win result and push the old one to linux" or "update the win result; we want linux to get the updated result also". I think the tooling lets you be explicit about this (i.e., we have the right infrastructure) but we may not have the best interface to it and users may not realize the side effects of their actions. re: "it's impossible to always know what the correct final configuration is" ... I don't quite understand this; assuming all the bots have produced results, and you know which bots are failing and which aren't, you can determine what the correct configuration is, right? So you're saying just one or both of those assumptions might not hold? or am I missing something?

Adam Barth

Comment 8 2012-09-19 13:11:08 PDT

> Maybe we should actually use different heuristics for the unexpected failures tab vs the expected failures tab. That makes sense. The time-skew issue is much less likely to occur for the unexpected failures tab. There's still the issue of configurations that don't have bots, but IMHO we should just delete those configurations. That's mostly what we've been doing (e.g., the google-chrome configuration is gone). Do we have any left? > re: "it's impossible to always know what the correct final configuration is" ... I don't quite understand this; assuming all the bots have produced results, and you know which bots are failing and which aren't, you can determine what the correct configuration is, right? So you're saying just one or both of those assumptions might not hold? or am I missing something? You can never know for certain that all the bots have produced a consistent set of results because the bots aren't synchronized. Ojan's point is that the time-skew issue is less likely to be a problem on the expected failures tab. It's definitely a problem on the unexpected failures tab.

Dirk Pranke

Comment 9 2012-09-19 13:16:18 PDT

(In reply to comment #8) > > You can never know for certain that all the bots have produced a consistent set of results because the bots aren't synchronized. Got it, thanks.

Ojan Vafai

Comment 10 2012-09-19 14:13:16 PDT

(In reply to comment #8) > There's still the issue of configurations that don't have bots, but IMHO we should just delete those configurations. I agree. Any configurations without bots cannot be expected to have their expected results kept up to date by other ports. This includes, for example, keeping pixel results up to date for ports that don't run pixel tests.

Ojan Vafai

Comment 11 2012-09-19 14:14:13 PDT

In either case, sounds like we have consensus on a path forward. Just need to find someone with time to make it happen. :) I'm gardening next week, so maybe I'll have some time to spare.

Ahmad Saleem

Comment 12 2022-09-25 06:55:13 PDT

From Comment 0, it seems to be Chromium port specific. Do we need it today? Thanks!

Alexey Proskuryakov

Comment 13 2022-09-26 13:05:23 PDT

This might still affect `webkit-patch rebaseline-server`, not sure.

alan

Comment 14 2022-09-26 20:22:00 PDT

(In reply to Alexey Proskuryakov from comment #13) > This might still affect `webkit-patch rebaseline-server`, not sure. I am not sure either.

Note You need to log in before you can comment on or make changes to this bug.

Status NEW

Resolution

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware Unspecified

OS Unspecified

Product WebKit

Component Tools / Tests

Assignee

Nobody

Reported

2012-09-18 14:36 PDT

Modified

2022-09-26 20:22 PDT History

CC List

7 users Show

URL

Keywords

Depends on

Blocks