Summary: | results.html should handle flaky tests differently | ||||||
---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Ojan Vafai <ojan> | ||||
Component: | Tools / Tests | Assignee: | Ojan Vafai <ojan> | ||||
Status: | RESOLVED FIXED | ||||||
Severity: | Normal | CC: | abarth, dpranke, kkristof, rniwa, simon.fraser | ||||
Priority: | P2 | ||||||
Version: | 528+ (Nightly build) | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Attachments: |
|
Description
Ojan Vafai
2012-07-10 10:00:46 PDT
I'm sure you realize this already but we don't currently have a way to compare the output from the two runs to see if they are different. I don't know that it would be particularly hard to add that. Also, I'm a bit concerned that implementing this just makes it even easier to ignore the tests in list 1, that seem to be should either be marked as expected flaky or actually be fixed. (In reply to comment #1) > I'm sure you realize this already but we don't currently have a way to compare the output from the two runs to see if they are different. I don't know that it would be particularly hard to add that. full_results.json, which is what results.html uses, has this information and shows it in the UI already. We don't technically know which run was first and which was second and we don't have the -actual.* files for the first run, but we have the type of failure for each run. > Also, I'm a bit concerned that implementing this just makes it even easier to ignore the tests in list 1, that seem to be should either be marked as expected flaky or actually be fixed. That's true. In practice, I think that this is already ignored. So the cost is that other non-flaky failures get missed. In fact, upon further thought, I think we should hide list 1 by default. There's just too much noise right now in the results.html output. (In reply to comment #2) > (In reply to comment #1) > > I'm sure you realize this already but we don't currently have a way to compare the output from the two runs to see if they are different. I don't know that it would be particularly hard to add that. > > full_results.json, which is what results.html uses, has this information and shows it in the UI already. We don't technically know which run was first and which was second and we don't have the -actual.* files for the first run, but we have the type of failure for each run. > True. Shouldn't we have the -actuals for the first run as well? > > Also, I'm a bit concerned that implementing this just makes it even easier to ignore the tests in list 1, that seem to be should either be marked as expected flaky or actually be fixed. > > That's true. In practice, I think that this is already ignored. So the cost is that other non-flaky failures get missed. > > In fact, upon further thought, I think we should hide list 1 by default. There's just too much noise right now in the results.html output. That would be hiding unexpected behavior, which seems kinda bad. If others thought this was a good idea, I'd be willing to give it a shot, though. (In reply to comment #3) > (In reply to comment #2) > > (In reply to comment #1) > > > I'm sure you realize this already but we don't currently have a way to compare the output from the two runs to see if they are different. I don't know that it would be particularly hard to add that. > > > > full_results.json, which is what results.html uses, has this information and shows it in the UI already. We don't technically know which run was first and which was second and we don't have the -actual.* files for the first run, but we have the type of failure for each run. > > > > True. Shouldn't we have the -actuals for the first run as well? Do we? Where do we store them? I thought the second run overwrote the first one. > > > Also, I'm a bit concerned that implementing this just makes it even easier to ignore the tests in list 1, that seem to be should either be marked as expected flaky or actually be fixed. > > > > That's true. In practice, I think that this is already ignored. So the cost is that other non-flaky failures get missed. > > > > In fact, upon further thought, I think we should hide list 1 by default. There's just too much noise right now in the results.html output. > > That would be hiding unexpected behavior, which seems kinda bad. If others thought this was a good idea, I'd be willing to give it a shot, though. I suppose. I see a slew (~12) of flaky passes everytime I run the tests. Maybe I just encounter this more because I usually run with -f and I'm just getting what I ask for. the output for the retry is in layout-test-results/retries/... Created attachment 152788 [details]
Patch
Committed r122864: <http://trac.webkit.org/changeset/122864> |