Summary: | fast/loader/stateobjects/replacestate-then-pushstate.html fails intermittently on all bots | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Eric Seidel (no email) <eric> | ||||||
Component: | Tools / Tests | Assignee: | Darin Fisher (:fishd, Google) <fishd> | ||||||
Status: | RESOLVED FIXED | ||||||||
Severity: | Normal | CC: | abarth, beidson, commit-queue, eric, fishd, sam, webkit.review.bot | ||||||
Priority: | P2 | ||||||||
Version: | 528+ (Nightly build) | ||||||||
Hardware: | PC | ||||||||
OS: | OS X 10.5 | ||||||||
Bug Depends on: | 36779 | ||||||||
Bug Blocks: | |||||||||
Attachments: |
|
Description
Eric Seidel (no email)
2010-03-26 01:25:24 PDT
Yes, it is dumping the wrong test. Yes, this has been happening with scattered tests that I added, sometimes very recently and sometimes many months ago. No, I have no idea how to explore this especially since they're often very rare failures on the bots and I can never reproduce locally. In other words... *sigh* failed again: http://build.webkit.org/results/SnowLeopard%20Intel%20Release%20(Tests)/r56733%20(7364)/fast/loader/stateobjects/replacestate-then-pushstate-diffs.txt Possibly related ASSERT hit on Leopard in bug 36779. Failed on the Leopard Commit bot just now too: https://bugs.webkit.org/show_bug.cgi?id=30144#c66 It appears to be failing on the Tiger bot as well: http://build.webkit.org/results/Tiger%20Intel%20Release/r56871%20(10265)/fast/loader/stateobjects/replacestate-then-pushstate-pretty-diff.html Seems we should skip/disable this test since it's failing on at least 3 bots. Until now, I wasn't aware this was failing every time on the bots. (In reply to comment #5) > Until now, I wasn't aware this was failing every time on the bots. Checking it myself, it appears it is not failing every time. Correct. It is not failing consistently on any machine to my knowledge. My apologies if my statements led you to believe such. This is however a common failure, just not consistent. :( The commit-queue just hit this too: https://bugs.webkit.org/show_bug.cgi?id=36369#c4 Tiger hit this several times today. Seems we should disable this test for all bots. Unless you have a fix in the pipeline Brady? (In reply to comment #9) > Tiger hit this several times today. Seems we should disable this test for all > bots. Unless you have a fix in the pipeline Brady? Is this one of the tests that's "failing" by showing the output from the previous test? ie - the test itself isn't necessarily failing but being polluted by something prior. Here's my dilemma. I would love to pour in the time to figure out what is wrong with the test harness, but I am not an expert at the test harness and would be spinning my wheels, and don't have time right now to spin my wheels. We can't just keep disabling tests that only have intermittent failures and (apparently) only on the bots. If I can never reproduce locally, I can never explore it. But it we just keep disabling test-after-test, then we lose the test coverage for when it really breaks 100% of the time, even for individual contributors on their own machines. I would love to hear referrals for people who are truly experts at run-webkit-tests and DRT, and to work with them to figure out what is going on. The flip side of that argument is that we have >13k tests. Because the bots (Tiger and Snow Leopard especially) are constantly red, we become blind to failures and miss real regressions. If we lose 0.01% of our tests, that's a net win for the project. However, in many cases, we've been able to keep the test coverage and remove the flakiness, even with no access to the failures locally. For example, last night I fixed a test that was failing only on Tiger with no ability to reproduce the failure myself. It just requires effort. I don't have visibility into the other demands on your time. It's entirely possible you have more important things to be working on. However, that doesn't means we should impose these costs on the rest of the project. > Is this one of the tests that's "failing" by showing the output from the > previous test? ie - the test itself isn't necessarily failing but being > polluted by something prior. Here's the failure pretty diff: http://build.webkit.org/results/Tiger%20Intel%20Release/r57003%20(10355)/fast/loader/stateobjects/replacestate-then-pushstate-pretty-diff.html By way of context, this test has failed on 8 of the last 10 Tiger bot runs. It's one of three tests that prevent this bot from being green and put under the protection of SheriffBot (who will help prevent future real regressions). Three tests are 0.023% of tests. (In reply to comment #11) > The flip side of that argument is that we have >13k tests. Because the bots > (Tiger and Snow Leopard especially) are constantly red, we become blind to > failures and miss real regressions. If we lose 0.01% of our tests, that's a > net win for the project. Unless the feature is new with only an initial batch of tests and disabling one of the tests would result in the loss of ~10% of the test coverage for a reasonably complicated mechanism. > However, in many cases, we've been able to keep the test coverage and remove > the flakiness, even with no access to the failures locally. For example, last > night I fixed a test that was failing only on Tiger with no ability to > reproduce the failure myself. It just requires effort. Wildly varying amounts of effort - I have spent time on this already and no one I've reached out to has decided it was worth helping. (In reply to comment #12) > > Is this one of the tests that's "failing" by showing the output from the > > previous test? ie - the test itself isn't necessarily failing but being > > polluted by something prior. > > Here's the failure pretty diff: > > http://build.webkit.org/results/Tiger%20Intel%20Release/r57003%20(10355)/fast/loader/stateobjects/replacestate-then-pushstate-pretty-diff.html This diff appears to be the previous test (fast/loader/replacestate-in-iframe.html) dumping into the test that the bots see as failing. This appears to best a test harness problem that I would love help tracking down, because apparently every other test I've added lately has been affected by this problem somehow. Any ideas? (CC'ing Darin Fisher, who added the test that is causing the forward failure) This test is also failing intermittently on the chromium bots. I'll investigate. I can read up on how this feature is supposed to work, but the first step I would try to black-box debug this issue is to insert a trivial test between the previous test and this one. That way we can see whether the previous test is leaking state forward or this test is pulling previous state forward. (In reply to comment #16) > I can read up on how this feature is supposed to work, but the first step I > would try to black-box debug this issue is to insert a trivial test between the > previous test and this one. That way we can see whether the previous test is > leaking state forward or this test is pulling previous state forward. That is a good idea - except since I have never been able to reproduce this locally, and I haven't heard from any other fellow WebKit developers nearby about this test failing for them, this appears to be a bot-only issue and your experiment would require checking in a dummy test for experimentation on the bots. Which (I don't believe) there is any precedent for. > Which (I don't believe) there is any precedent for.
Right, I'm proposing adding a new test to debug this on the bots. Given that it fails about 80% of the time, we should get our answer in a few cycles. I've done similar experiments before to solve bot-only issues. In any case, I don't think it's a big problem to add more tests.
Reading replacestate-then-pushstate.html, I wonder if the call to history.back() is causing problems, for example if the history timer is racing with some other asynchronous bit? An interesting clue is that the previous test results are dumped as a render tree and not as text. (In reply to comment #19) > Reading replacestate-then-pushstate.html, I wonder if the call to > history.back() is causing problems, for example if the history timer is racing > with some other asynchronous bit? An interesting clue is that the previous > test results are dumped as a render tree and not as text. Which seems to always be the case in these types of failures that we've seen lately. Because of the varying and async nature of the test it is - of course - a waitUntilDone test. And it seems to me that DRT should have already done its dumping for the previous test before the new test is started, then wiped that out. With those two understandings, I'm not sure what asynchronous bit might exist here. > Which seems to always be the case in these types of failures that we've seen > lately. Do you have links to other failures of this kind? I'm new to the flaky test party. > Because of the varying and async nature of the test it is - of course - a > waitUntilDone test. Both this test and the previous test are waitUntilDone. > And it seems to me that DRT should have already done its dumping for the > previous test before the new test is started, then wiped that out. Right. The previous test passes, so DRT has decided it's happy with what happened there. > With those two understandings, I'm not sure what asynchronous bit might exist > here. When I've seen leakages before, it's because a test is running code after DRT thinks that the test is done. For example, if you call alert in your unload handler, the alert will be recorded for the next test. Maybe popstate has similar timing? Is that called when you navigate away from a page? I guess I should read up on how this API works. Woah, I'm now convinced that http://trac.webkit.org/browser/trunk/LayoutTests/fast/loader/stateobjects/replacestate-in-iframe.html is the problem. That test is nuts. We should run it in popup window. I think this is probably related to the fact that we dispatch the PopStateEvent synchronously. In fast/loader/replacestate-in-iframe.html, the PopStateEvent occurs in an iframe, and it is from there that the test calls notifyDone(). I think I'm going to approach this problem by making the test less nutty ;-) (In reply to comment #22) > Woah, I'm now convinced that > http://trac.webkit.org/browser/trunk/LayoutTests/fast/loader/stateobjects/replacestate-in-iframe.html > is the problem. That test is nuts. We should run it in popup window. I agree that test is nutty, and am glad Darin's looking into it. I'm still at a loss on this from the high run-webkit-tests + DRT level. Could you share your understanding of why Darin's test causes this to happen? And could we fix DRT + run-webkit-tests to handle Darin's test -as nutty as it is? :) I ask because - like I mentioned - this has come up a couple of time with bizarre loader tests I've added over the last few months and I *really* need the understanding of the root cause so I can go back and reenable disabled tests and fix other intermittent weirdness! I believe these are two of the tests Brady is referring to: http://trac.webkit.org/browser/trunk/LayoutTests/fast/loader/api-test-go-to-current-back-forward-item.html-disabled http://trac.webkit.org/browser/trunk/LayoutTests/fast/loader/api-test-new-window-data-load-base-url.html-disabled Created attachment 52435 [details]
v1 patch
(In reply to comment #24) > And could we fix DRT + run-webkit-tests to handle Darin's test -as nutty as it > is? :) I looked into this, and I'm not sure it is possible. In this case, the root cause of the bug is that onpopstate sometimes runs before the page is fully loaded. During onpopstate, I was calling notifyDone. As a result, when the page finally finishes loading, it ends up dumping the test results again. One way to avoid problems like this would be to create a new WebView for each test. (In reply to comment #27) > (In reply to comment #24) > > And could we fix DRT + run-webkit-tests to handle Darin's test -as nutty as it > > is? :) > > I looked into this, and I'm not sure it is possible. In this case, the root > cause of the bug is that onpopstate sometimes runs before the page is fully > loaded. During onpopstate, I was calling notifyDone. As a result, when the > page finally finishes loading, it ends up dumping the test results again. > > One way to avoid problems like this would be to create a new WebView for each > test. Presumably we could make notifyDone() be a synchronous message to DRT meaning "Hey, test is over - RIGHT NOW" and DRT could cancel the load if its not done yet. I don't know how many other tests this would affect, but one might argue that those other tests are "wrong" Comment on attachment 52435 [details]
v1 patch
Whoops, I just realized that this no longer tests the fix I was intending it to test! Doh!
BTW, new-run-webkit-tests support --run-singly, which I think uses a new DRT process for each test. "old-" run webkit-tests supports that too. :) Created attachment 52475 [details]
v2 patch
Comment on attachment 52475 [details]
v2 patch
LGTM, thanks. Sounds like a cool bug.
Landed as http://trac.webkit.org/changeset/57042 http://trac.webkit.org/changeset/57042 might have broken Leopard Intel Debug (Tests) It's really https://bugs.webkit.org/show_bug.cgi?id=36646. The failure rate from that test is insanely high. I've disabled it. |