Bug 96041

Summary: Chromium Linux EWS bots and CQ bots are flaky
Product: WebKit Reporter: Tony Chang <tony>
Component: Tools / TestsAssignee: Tony Chang <tony>
Status: RESOLVED WORKSFORME    
Severity: Normal CC: abarth, dpranke, jamesr, japhet, wjmaclean
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
Patch abarth: review+

Tony Chang
Reported 2012-09-06 16:56:25 PDT
The bots keep failing the layout tests and retrying. This is causing the queue to get really slow. Filing this bug for tracking and discussion. Looking at the logs, it looks like the platform/chromium-linux/compositing/gestures are often failing image diffs. I ssh'ed to the machine and looked at the results. The actual results for some of those tests are a solid black 800x600 png. We don't see this failure on the build.webkit.org or build.chromium.org waterfalls. These tests should use the software path, right? I see a few other failures, but it's not clear to me if the bots would process faster if we marked platform/chromium-linux/compositing/gestures as flaky.
Attachments
Patch (1.48 KB, patch)
2012-09-06 17:14 PDT, Tony Chang
abarth: review+
Tony Chang
Comment 1 2012-09-06 16:57:16 PDT
The platform/chromium-linux/compositing/gestures tests were added on Aug 22. It's not clear to me if the flakiness started around then or after that.
James Robinson
Comment 2 2012-09-06 17:06:00 PDT
Because there's "compositing" in the path these will use the h/w path (which is backed by osmesa). These are new tests and I'm not shocked that they are kind of messed up. Let's skip them or mark them flaky and let wjmaclean@ work on fixing them. They aren't worth holding everything else up.
Tony Chang
Comment 3 2012-09-06 17:14:40 PDT
Adam Barth
Comment 4 2012-09-06 17:18:24 PDT
Comment on attachment 162623 [details] Patch ok
Adam Barth
Comment 5 2012-09-06 17:18:31 PDT
Thanks for investigating.
Tony Chang
Comment 6 2012-09-06 17:20:51 PDT
Tony Chang
Comment 7 2012-09-06 17:21:38 PDT
Comment on attachment 162623 [details] Patch This is just speculative, so I'm keeping the bug open. Hopefully the cr-linux queue will clear overnight.
Tony Chang
Comment 8 2012-09-06 18:25:02 PDT
Looking at the CQ now, there are 5 runs that failed. Fails a bunch of compositing tests: http://webkit-commit-queue.appspot.com/results/13778383 2 http cache tests with missing results: http://webkit-commit-queue.appspot.com/results/13775546 http://webkit-commit-queue.appspot.com/results/13785213 http://webkit-commit-queue.appspot.com/results/13785209 http://webkit-commit-queue.appspot.com/results/13765808 I wonder if the http cache tests is related to https://bugs.webkit.org/show_bug.cgi?id=93195 . Not sure why they suddenly became flaky.
Tony Chang
Comment 9 2012-09-07 10:05:27 PDT
Looking at the ews bot, 2 http tests seem super flaky: http/tests/cache/stopped-revalidation.html = MISSING http/tests/cache/subresource-expiration-1.html = MISSING Here are the diffs: http://pastebin.com/hLfRDTmp http://pastebin.com/vk2uz3dh Looks like neither test is registering dumpAsText() and the second test is getting the output from the first test. I think we have a bug for tests getting out of sync. I'm going to mark these 2 tests as flaky while we investigate. It looks like notifyDone is getting out of sync with the tests. Maybe we're not properly clearing the work queue between tests?
Tony Chang
Comment 10 2012-09-07 10:08:46 PDT
James Robinson
Comment 11 2012-09-07 10:39:46 PDT
One of the platform/chromium-linux/compositing/gestures tests involves a navigation - perhaps it's mucking things up?
Tony Chang
Comment 12 2012-09-07 11:19:09 PDT
Now I'm seeing http/tests/cache/subresource-expiration-2.html = MISSING http/tests/cache/subresource-failover-to-network.html = MISSING But I am able to repro with: new-run-webkit-tests --no-new-test-results --skip-failing-tests --verbose http I'll do some digging . . .
Tony Chang
Comment 13 2012-09-07 11:28:51 PDT
http://trac.webkit.org/changeset/127897 Turns out that http/tests/cache/cancel-during-revalidation-succeeded.html is causing the 2 following tests to fail. Skipping cancel-during-revalidation-succeeded.html seems to fix the problem on my machine. Nate, do you think you can take a look?
Tony Chang
Comment 14 2012-09-07 13:59:29 PDT
http://trac.webkit.org/changeset/127916 is a revert of http://trac.webkit.org/changeset/127803, which skipped the compositing/gestures tests. Other compositing tests were failing the same way, so I put that back. The cr-linux ews bot seems to be running smoother since skipping the http test, even with the compositing test failures.
James Robinson
Comment 15 2012-09-07 18:36:19 PDT
Skipped the directory in http://trac.webkit.org/changeset/127954. Let's see if that helps. James - can you please take a look at this when you get a chance? If it does turn out to be these tests then I'm pretty sure that indicates a real problem in the code they test that we need to address.
W. James MacLean
Comment 16 2012-09-10 06:00:23 PDT
(In reply to comment #15) > Skipped the directory in http://trac.webkit.org/changeset/127954. Let's see if that helps. James - can you please take a look at this when you get a chance? If it does turn out to be these tests then I'm pretty sure that indicates a real problem in the code they test that we need to address. Sure, I'll look and see what's going on.
Adam Barth
Comment 17 2012-09-10 09:56:11 PDT
We're still getting failures in platform/chromium/compositing
Tony Chang
Comment 18 2012-09-17 12:05:31 PDT
The bots have been running OK for the past week. Maybe we should file separate bugs for the flaky HTTP test and the compositing tests and close this bug out?
Adam Barth
Comment 19 2012-09-17 12:08:14 PDT
SGTM
Note You need to log in before you can comment on or make changes to this bug.