RESOLVED FIXED Bug 90892
results.html should handle flaky tests differently
https://bugs.webkit.org/show_bug.cgi?id=90892
Summary results.html should handle flaky tests differently
Ojan Vafai
Reported 2012-07-10 10:00:46 PDT
We should have two flaky lists. 1. Tests that fail the first run and pass the second. 2. Tests that fail both runs but in different ways. List 1 should come after tests that timed out and tests with stderr output (before "expected to fail but passed"). List 2 should be where the flaky tests currently are. List 1 is consistently just noise that makes the page harder to make sense of. Also, I frequently want to flag the list of the reliable failures to rerun. It's annoying to have to tab through all the flaky passes to get the the timeouts.
Attachments
Patch (5.88 KB, patch)
2012-07-17 10:59 PDT, Ojan Vafai
dpranke: review+
Dirk Pranke
Comment 1 2012-07-11 12:23:01 PDT
I'm sure you realize this already but we don't currently have a way to compare the output from the two runs to see if they are different. I don't know that it would be particularly hard to add that. Also, I'm a bit concerned that implementing this just makes it even easier to ignore the tests in list 1, that seem to be should either be marked as expected flaky or actually be fixed.
Ojan Vafai
Comment 2 2012-07-11 15:05:24 PDT
(In reply to comment #1) > I'm sure you realize this already but we don't currently have a way to compare the output from the two runs to see if they are different. I don't know that it would be particularly hard to add that. full_results.json, which is what results.html uses, has this information and shows it in the UI already. We don't technically know which run was first and which was second and we don't have the -actual.* files for the first run, but we have the type of failure for each run. > Also, I'm a bit concerned that implementing this just makes it even easier to ignore the tests in list 1, that seem to be should either be marked as expected flaky or actually be fixed. That's true. In practice, I think that this is already ignored. So the cost is that other non-flaky failures get missed. In fact, upon further thought, I think we should hide list 1 by default. There's just too much noise right now in the results.html output.
Dirk Pranke
Comment 3 2012-07-11 15:11:13 PDT
(In reply to comment #2) > (In reply to comment #1) > > I'm sure you realize this already but we don't currently have a way to compare the output from the two runs to see if they are different. I don't know that it would be particularly hard to add that. > > full_results.json, which is what results.html uses, has this information and shows it in the UI already. We don't technically know which run was first and which was second and we don't have the -actual.* files for the first run, but we have the type of failure for each run. > True. Shouldn't we have the -actuals for the first run as well? > > Also, I'm a bit concerned that implementing this just makes it even easier to ignore the tests in list 1, that seem to be should either be marked as expected flaky or actually be fixed. > > That's true. In practice, I think that this is already ignored. So the cost is that other non-flaky failures get missed. > > In fact, upon further thought, I think we should hide list 1 by default. There's just too much noise right now in the results.html output. That would be hiding unexpected behavior, which seems kinda bad. If others thought this was a good idea, I'd be willing to give it a shot, though.
Ojan Vafai
Comment 4 2012-07-11 15:18:28 PDT
(In reply to comment #3) > (In reply to comment #2) > > (In reply to comment #1) > > > I'm sure you realize this already but we don't currently have a way to compare the output from the two runs to see if they are different. I don't know that it would be particularly hard to add that. > > > > full_results.json, which is what results.html uses, has this information and shows it in the UI already. We don't technically know which run was first and which was second and we don't have the -actual.* files for the first run, but we have the type of failure for each run. > > > > True. Shouldn't we have the -actuals for the first run as well? Do we? Where do we store them? I thought the second run overwrote the first one. > > > Also, I'm a bit concerned that implementing this just makes it even easier to ignore the tests in list 1, that seem to be should either be marked as expected flaky or actually be fixed. > > > > That's true. In practice, I think that this is already ignored. So the cost is that other non-flaky failures get missed. > > > > In fact, upon further thought, I think we should hide list 1 by default. There's just too much noise right now in the results.html output. > > That would be hiding unexpected behavior, which seems kinda bad. If others thought this was a good idea, I'd be willing to give it a shot, though. I suppose. I see a slew (~12) of flaky passes everytime I run the tests. Maybe I just encounter this more because I usually run with -f and I'm just getting what I ask for.
Dirk Pranke
Comment 5 2012-07-11 15:55:16 PDT
the output for the retry is in layout-test-results/retries/...
Ojan Vafai
Comment 6 2012-07-17 10:59:46 PDT
Ojan Vafai
Comment 7 2012-07-17 11:44:44 PDT
Note You need to log in before you can comment on or make changes to this bug.