Bug 58122 - chromium rebaseline script did the wrong thing with snowleopard expectations
Summary: chromium rebaseline script did the wrong thing with snowleopard expectations
Status: RESOLVED WONTFIX
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC OS X 10.5
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-04-08 01:05 PDT by Ojan Vafai
Modified: 2012-03-12 19:39 PDT (History)
7 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ojan Vafai 2011-04-08 01:05:26 PDT
The script did the wrong thing here http://trac.webkit.org/changeset/83024, which was fixed by http://trac.webkit.org/changeset/83030.

I'll draw a hypothetical, simplified version of the error case for foo/baz.html.

1. The test has a result in platform/chromium-mac that actually matches Leopard, but not SnowLeopard.
2. Added "REBASELINE MAC : foo/baz.html = TEXT" to test_expectations.txt.
3. Ran the tool, resulting in r83024.

Here's what I think happened: Since the Leopard bot was passing, no results were downloaded. But the results from the SnowLeopard bot were put in chromium-mac. The results in platform/chromium-mac now matched the results in platform/mac, so they were deleted in r83024.

Instead, the chromium-mac results should have been moved to chromium-mac-leopard (which is was r83030 achieved).

FWIW, r83030 was accomplished by adding the lines back to test_expectations.txt and running the rebaseline tool again. So it worked well for the case of tests that pass on SnowLeopard but fail on Leopard.
Comment 1 Dirk Pranke 2011-04-08 09:27:17 PDT
Hm. So the chromium-mac directory actually contained baselines that would pass on Leopard but fail on SL? That certainly seems plausible, at least when introducing new o/s versions.

Unfortunately, fixing this so that the script is smart enough to realize that a baseline should move *down* the search path rather than up (or be deleted) requires the code to have a more holistic point of view than it currently does (since it basically only does a port at a time).

I wonder if there's a way to change the code to update/move/delete files in two passes, or if it's just best to leave this as an edge case and hope the new tool koz is working on will deal with this.

It seems like the "workaround" would be to just re-run the tool after the bots had cycled again, at which point SL would pass but Leopard would fail, and the tool could rebaseline the Leopard bots to work properly.

Also adding Adrienne to this, as she might've seen a similar problem a couple days ago.
Comment 2 Adrienne Walker 2011-04-08 09:50:53 PDT
(In reply to comment #1)

> Also adding Adrienne to this, as they might've seen a similar problem a couple days ago.

Yeah, I made a very similar attempt to rebaseline those exact same tests, but was too swamped with gardening to do anything more than just revert at the time.  I was going to file it today, but you beat me to it.  :)
Comment 3 Ojan Vafai 2011-04-10 17:46:04 PDT
The way I was picturing this working is the following:
When rebaselining any platform's results, we find the first result in it't it's baseline search path and copy it to that platform's result. Then we grab the results off the bot and dedupe. It's a good deal more work on the filesystem, but I don't think the filesystem work is significant in terms of how long this script takes to run.

So, in this case, to start with, there's a result in chromium-mac and we want to rebaseline snowleopard. So we do the following:
1. Copy the result from chromium-mac to chromium-mac-leopard and chromium-mac-snowleopard.
2. Grab the snowleopard result from the bot.
3. Dedupe

Given that we have coverage for all platforms now, that should work, right?
Comment 4 Dirk Pranke 2011-04-12 14:11:28 PDT
(In reply to comment #3)
> The way I was picturing this working is the following:
> When rebaselining any platform's results, we find the first result in it't it's baseline search path and copy it to that platform's result. Then we grab the results off the bot and dedupe. It's a good deal more work on the filesystem, but I don't think the filesystem work is significant in terms of how long this script takes to run.
> 
> So, in this case, to start with, there's a result in chromium-mac and we want to rebaseline snowleopard. So we do the following:
> 1. Copy the result from chromium-mac to chromium-mac-leopard and chromium-mac-snowleopard.

I'm not sure I follow this algorithm. Partially, I think this is because your example doesn't match the previous paragraph. If we were rebaselining just SL, why would be touch chromium-mac-leopard (since that's not in the path)?

But, I'll assume that's a typo and try to interpret the first paragraph. 

The way the code works now, it processes a port at a time, depth first, so SL is processed, then Leopard, then Win7, then Vista, etc. 

So, first we run SL, and SL updates chromium-mac with the new baseline. Then we run Leopard, and at that point have lost access to the original chromium-mac baseline (which was really a Leopard baseline).

To fix this, we'd have to do multiple passes down the port list first to copy the baselines into the version-specific dir, and then update all of the baselines, and then de-dup in a third pass (might be able to copy this into the second pass).

I'm a bit worried that this might be a fair amount of surgery on the script and might destabilize other things. Given that this will likely only come up when we introduce new versions of operating systems, I'm not sure if it's worth the effort. Thoughts?
Comment 5 James Kozianski 2011-04-12 17:10:43 PDT
In theory the new rebaseline tool would handle this case correctly. Results from a chromium-snow-leopard bot wouldn't make it into the chromium-mac directory unless all the children of chromium-mac had equal results in them.

The algorithm that the new tool uses is

1. Construct a tree out of all the baseline search paths for all the ports
2. Retrieve new baselines for the given platforms and insert them into this in-memory tree in the port-specific result directory
3. Dedupe

So if I understand correctly [1] this case should work.


[1] As I understand it

* The chromium-mac directory had results X in it
* The chromium-mac-leopard directory had no results in it (thus inheriting chromium-mac results)
* chromium-mac-leopard and chromium-mac-snowleopard both fall back directly onto chromium-mac

We did a rebaseline for MAC, which caused new results for chromium-mac-snowleopard to get pulled down, and these were copied into chromium-mac, thus changing the results that chromium-mac-leopard inherited which was incorrect.
Comment 6 Dirk Pranke 2011-04-12 17:17:02 PDT
(In reply to comment #5)
> In theory the new rebaseline tool would handle this case correctly. Results from a chromium-snow-leopard bot wouldn't make it into the chromium-mac directory unless all the children of chromium-mac had equal results in them.
> 

Ack, no, that's not what should happen. Until Lion is released, chromium-mac-snowleopard (the port) stores its baselines in chromium-mac (the directory). So, it is okay for a test to exist in both chromium-mac and chromium-mac-leopard.

> The algorithm that the new tool uses is
> 
> 1. Construct a tree out of all the baseline search paths for all the ports
> 2. Retrieve new baselines for the given platforms and insert them into this in-memory tree in the port-specific result directory
> 3. Dedupe
> 
> So if I understand correctly [1] this case should work.
>

Your understanding of [1] matches mine, and I think that it is even correct :)

That said, how do you know if there are "new" baselines? Given that the checked-in baseline in chromium-mac (the directory) currently passes on chromium-mac-leopard (the bot), there are no "new" leopard baselines until after the snowleopard versions overwrite the leopard versions, and the leopard bot starts to fail.

-- Dirk
Comment 7 James Kozianski 2011-04-12 20:04:44 PDT
(In reply to comment #6)
> Ack, no, that's not what should happen. Until Lion is released, chromium-mac-snowleopard (the port) stores its baselines in chromium-mac (the directory). So, it is okay for a test to exist in both chromium-mac and chromium-mac-leopard.

Ah, okay. Would it be possible to change this behaviour? It seems simpler if all builders point to a specific directory (ie: the port they themselves are running), and we have directories like chromium-mac as nothing more than a shorthand for chromium-mac-*.

So the correct behaviour in this situation is to put the SL results in chromium-mac, and move the original chromium-mac results into chromium-mac-leopard? Or in other words, before overwriting results in directory X, copy those results to all the directories that currently fall back onto X. It shouldn't be too hard to add that logic to the new tool.


> Your understanding of [1] matches mine, and I think that it is even correct :)

I'm glad. Sometimes these rebaseline discussions can be hard to follow :-)

> That said, how do you know if there are "new" baselines? Given that the checked-in baseline in chromium-mac (the directory) currently passes on chromium-mac-leopard (the bot), there are no "new" leopard baselines until after the snowleopard versions overwrite the leopard versions, and the leopard bot starts to fail.

The new tool has an understanding of the complete fallback order, so it will be able to tell that changing chromium-mac will also change chromium-mac-leopard if the latter is empty and we can add custom logic for handling those cases (such as the logic described above).
Comment 8 Dirk Pranke 2011-04-12 22:25:36 PDT
(In reply to comment #7)
> (In reply to comment #6)
> > Ack, no, that's not what should happen. Until Lion is released, chromium-mac-snowleopard (the port) stores its baselines in chromium-mac (the directory). So, it is okay for a test to exist in both chromium-mac and chromium-mac-leopard.
> 
> Ah, okay. Would it be possible to change this behaviour? It seems simpler if all builders point to a specific directory (ie: the port they themselves are running), and we have directories like chromium-mac as nothing more than a shorthand for chromium-mac-*.
> 

No, this is very much by design. See the writeup I sent to chromium-dev a couple weeks ago for the rationale on this:

http://groups.google.com/a/chromium.org/group/chromium-dev/browse_thread/thread/52a01dd2c7380457/a6f57f5b912b9cc8?lnk=raot&pli=1

> So the correct behaviour in this situation is to put the SL results in chromium-mac, and move the original chromium-mac results into chromium-mac-leopard?

Yes.

> Or in other words, before overwriting results in directory X, copy those results to all the directories that currently fall back onto X. It shouldn't be too hard to add that logic to the new tool.
>

I don't know that this will work as a general rule, but it might. This is part of the destabilization I was concerned about before.
Comment 9 James Kozianski 2011-04-12 23:53:55 PDT
> No, this is very much by design. See the writeup I sent to chromium-dev a couple weeks ago for the rationale on this:
> 
> http://groups.google.com/a/chromium.org/group/chromium-dev/browse_thread/thread/52a01dd2c7380457/a6f57f5b912b9cc8?lnk=raot&pli=1

Ah, I see. Thanks for the explanation!

> > Or in other words, before overwriting results in directory X, copy those results to all the directories that currently fall back onto X. It shouldn't be too hard to add that logic to the new tool.
> >
> 
> I don't know that this will work as a general rule, but it might. This is part of the destabilization I was concerned about before.

So the problem in abstract terms would be that when rebaselining ports that other ports fall back onto, you can get into an incorrect state. My thinking is that you can't get into an incorrect state if you rebaseline entire trees of dependent ports at once (which is essentially what a request to rebaseline MAC should do).

(As an aside, I think that potential damage / confusion caused by a surprising rebaseline could be lessened significantly if the tool emitted justifications for its actions.)
Comment 10 Dirk Pranke 2011-04-13 12:44:27 PDT
(In reply to comment #9)
> > No, this is very much by design. See the writeup I sent to chromium-dev a couple weeks ago for the rationale on this:
> > 
> > http://groups.google.com/a/chromium.org/group/chromium-dev/browse_thread/thread/52a01dd2c7380457/a6f57f5b912b9cc8?lnk=raot&pli=1
> 
> Ah, I see. Thanks for the explanation!
> 
> > > Or in other words, before overwriting results in directory X, copy those results to all the directories that currently fall back onto X. It shouldn't be too hard to add that logic to the new tool.
> > >
> > 
> > I don't know that this will work as a general rule, but it might. This is part of the destabilization I was concerned about before.
> 
> So the problem in abstract terms would be that when rebaselining ports that other ports fall back onto, you can get into an incorrect state. My thinking is that you can't get into an incorrect state if you rebaseline entire trees of dependent ports at once (which is essentially what a request to rebaseline MAC should do).
> 

Yes, I agree that "all at once" should be the safe and right thing to do. I just think we need to stare carefully at whatever algorithm we implement to do everything at once to make sure we're not missing some use case.

And yes, r-c-w-t definitely could use some better logging.
Comment 11 Dirk Pranke 2012-03-12 18:26:27 PDT
ojan, do you think we can close this now, or is this still an issue?