96041 – Chromium Linux EWS bots and CQ bots are flaky

RESOLVED WORKSFORME 96041

Chromium Linux EWS bots and CQ bots are flaky

https://bugs.webkit.org/show_bug.cgi?id=96041

Summary Chromium Linux EWS bots and CQ bots are flaky

Tony Chang

Reported 2012-09-06 16:56:25 PDT

The bots keep failing the layout tests and retrying. This is causing the queue to get really slow. Filing this bug for tracking and discussion. Looking at the logs, it looks like the platform/chromium-linux/compositing/gestures are often failing image diffs. I ssh'ed to the machine and looked at the results. The actual results for some of those tests are a solid black 800x600 png. We don't see this failure on the build.webkit.org or build.chromium.org waterfalls. These tests should use the software path, right? I see a few other failures, but it's not clear to me if the bots would process faster if we marked platform/chromium-linux/compositing/gestures as flaky.

Attachments
Patch (1.48 KB, patch) 2012-09-06 17:14 PDT, Tony Chang	abarth: review+	Details Formatted Diff Diff
Show Obsolete (1) View All Add attachment proposed patch, testcase, etc.

Tony Chang

Comment 1 2012-09-06 16:57:16 PDT

The platform/chromium-linux/compositing/gestures tests were added on Aug 22. It's not clear to me if the flakiness started around then or after that.

James Robinson

Comment 2 2012-09-06 17:06:00 PDT

Because there's "compositing" in the path these will use the h/w path (which is backed by osmesa). These are new tests and I'm not shocked that they are kind of messed up. Let's skip them or mark them flaky and let wjmaclean@ work on fixing them. They aren't worth holding everything else up.

Tony Chang

Comment 3 2012-09-06 17:14:40 PDT

Created attachment 162623 [details] Patch

Adam Barth

Comment 4 2012-09-06 17:18:24 PDT

Comment on attachment 162623 [details] Patch ok

Adam Barth

Comment 5 2012-09-06 17:18:31 PDT

Thanks for investigating.

Tony Chang

Comment 6 2012-09-06 17:20:51 PDT

Committed r127803: <http://trac.webkit.org/changeset/127803>

Tony Chang

Comment 7 2012-09-06 17:21:38 PDT

Comment on attachment 162623 [details] Patch This is just speculative, so I'm keeping the bug open. Hopefully the cr-linux queue will clear overnight.

Tony Chang

Comment 8 2012-09-06 18:25:02 PDT

Looking at the CQ now, there are 5 runs that failed. Fails a bunch of compositing tests: http://webkit-commit-queue.appspot.com/results/13778383 2 http cache tests with missing results: http://webkit-commit-queue.appspot.com/results/13775546 http://webkit-commit-queue.appspot.com/results/13785213 http://webkit-commit-queue.appspot.com/results/13785209 http://webkit-commit-queue.appspot.com/results/13765808 I wonder if the http cache tests is related to https://bugs.webkit.org/show_bug.cgi?id=93195 . Not sure why they suddenly became flaky.

Tony Chang

Comment 9 2012-09-07 10:05:27 PDT

Looking at the ews bot, 2 http tests seem super flaky: http/tests/cache/stopped-revalidation.html = MISSING http/tests/cache/subresource-expiration-1.html = MISSING Here are the diffs: http://pastebin.com/hLfRDTmp http://pastebin.com/vk2uz3dh Looks like neither test is registering dumpAsText() and the second test is getting the output from the first test. I think we have a bug for tests getting out of sync. I'm going to mark these 2 tests as flaky while we investigate. It looks like notifyDone is getting out of sync with the tests. Maybe we're not properly clearing the work queue between tests?

Tony Chang

Comment 10 2012-09-07 10:08:46 PDT

http://trac.webkit.org/changeset/127883

James Robinson

Comment 11 2012-09-07 10:39:46 PDT

One of the platform/chromium-linux/compositing/gestures tests involves a navigation - perhaps it's mucking things up?

Tony Chang

Comment 12 2012-09-07 11:19:09 PDT

Now I'm seeing http/tests/cache/subresource-expiration-2.html = MISSING http/tests/cache/subresource-failover-to-network.html = MISSING But I am able to repro with: new-run-webkit-tests --no-new-test-results --skip-failing-tests --verbose http I'll do some digging . . .

Tony Chang

Comment 13 2012-09-07 11:28:51 PDT

http://trac.webkit.org/changeset/127897 Turns out that http/tests/cache/cancel-during-revalidation-succeeded.html is causing the 2 following tests to fail. Skipping cancel-during-revalidation-succeeded.html seems to fix the problem on my machine. Nate, do you think you can take a look?

Tony Chang

Comment 14 2012-09-07 13:59:29 PDT

http://trac.webkit.org/changeset/127916 is a revert of http://trac.webkit.org/changeset/127803, which skipped the compositing/gestures tests. Other compositing tests were failing the same way, so I put that back. The cr-linux ews bot seems to be running smoother since skipping the http test, even with the compositing test failures.

James Robinson

Comment 15 2012-09-07 18:36:19 PDT

Skipped the directory in http://trac.webkit.org/changeset/127954. Let's see if that helps. James - can you please take a look at this when you get a chance? If it does turn out to be these tests then I'm pretty sure that indicates a real problem in the code they test that we need to address.

W. James MacLean

Comment 16 2012-09-10 06:00:23 PDT

(In reply to comment #15) > Skipped the directory in http://trac.webkit.org/changeset/127954. Let's see if that helps. James - can you please take a look at this when you get a chance? If it does turn out to be these tests then I'm pretty sure that indicates a real problem in the code they test that we need to address. Sure, I'll look and see what's going on.

Adam Barth

Comment 17 2012-09-10 09:56:11 PDT

We're still getting failures in platform/chromium/compositing

Tony Chang

Comment 18 2012-09-17 12:05:31 PDT

The bots have been running OK for the past week. Maybe we should file separate bugs for the flaky HTTP test and the compositing tests and close this bug out?

Adam Barth

Comment 19 2012-09-17 12:08:14 PDT

SGTM

Tony Chang

Comment 20 2012-09-17 13:50:24 PDT

https://bugs.webkit.org/show_bug.cgi?id=96950 https://bugs.webkit.org/show_bug.cgi?id=96951

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution WORKSFORME

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware Unspecified

OS Unspecified

Product WebKit

Component Tools / Tests

Assignee

Tony Chang

Reported

2012-09-06 16:56 PDT

Modified

2012-09-17 13:50 PDT History

CC List

5 users Show

URL

Keywords

Depends on

Blocks