83411 – All Mac bots occasionally fail tons of media/ sputnik/ and svg/ tests

REOPENED 83411

All Mac bots occasionally fail tons of media/ sputnik/ and svg/ tests

https://bugs.webkit.org/show_bug.cgi?id=83411

Summary All Mac bots occasionally fail tons of media/ sputnik/ and svg/ tests

Beth Dakin

Reported 2012-04-06 16:29:38 PDT

Starting around the middle of the day on April 5th, the Mac bots started occasionally failing tons of tests, primarily in media/ sputnik/ and svg/. There will be a number of good runs with the expected number of failures right now (0-10 on Lion bots), and then one or two runs with 500+ failures.

Attachments
(committed r113910, r=eric_carlson) speculative patch (3.78 KB, patch) 2012-04-11 12:43 PDT, Antonio Gomes	no flags	Details Formatted Diff Diff
rebaseline (1.31 KB, patch) 2012-04-11 22:29 PDT, Antonio Gomes	no flags	Details Formatted Diff Diff
View All Add attachment proposed patch, testcase, etc.

Radar WebKit Bug Importer

Comment 1 2012-04-06 16:30:14 PDT

<rdar://problem/11204404>

Beth Dakin

Comment 2 2012-04-06 16:51:25 PDT

It seems like generated results are being compared to the wrong expected results. Somehow, we're getting out of kilter.

Dirk Pranke

Comment 3 2012-04-06 18:10:36 PDT

Are there example logs for this I can look at? ... Chromium DRT will actually echo the URL for the test back into the output, so there's code in new-run-webkit-tests that actually asserts if the output is for the wrong test. It doesn't look like there's anything equivalent for the non-Chromium ports, so debugging this might be harder than I would like :(.

Beth Dakin

Comment 4 2012-04-09 11:49:54 PDT

Has anything changed in the test harness in the last week that could lead to this problem? This is a really terrible issue, and I agree it's hard to debug, so maybe we can just figure out what caused it to start happening. I don't know how long these links will be good for, but here's an example: This test is failing: http://trac.webkit.org/export/113594/trunk/LayoutTests/media/unsupported-rtsp.html Here's the pretty diff: http://build.webkit.org/results/Lion%20Release%20(Tests)/r113594%20(7423)/media/unsupported-rtsp-pretty-diff.html The pretty diff is comparing the results of this test: http://trac.webkit.org/export/113594/trunk/LayoutTests/media/svg-as-image-with-media-blocked.html with the expected results of media/unsupported-rtsp.html, and it only thinks that media/unsupported-rtsp.html is failing because the generated results are from media/svg-as-image-with-media-blocked.html

Simon Fraser (smfr)

Comment 5 2012-04-09 11:53:04 PDT

On the Lion Release Tests bot, this started between 113371 and 113376. <http://build.webkit.org/builders/Lion%20Release%20%28Tests%29?numbuilds=200> On Snow Leopard Release Tests, this started between 113363 and 113371. <http://build.webkit.org/builders/SnowLeopard%20Intel%20Release%20%28Tests%29?numbuilds=200>

Dirk Pranke

Comment 6 2012-04-09 17:31:12 PDT

The only thing that's changed in the harness in that timeframe that looks even remotely related is the change to run *fewer* workers on the bots: https://bugs.webkit.org/show_bug.cgi?id=83076 It's hard to imagine that that would be a problem, though. I haven't gotten to take much of a look at this yet, will keep investigating.

Simon Fraser (smfr)

Comment 7 2012-04-09 17:41:28 PDT

My guess is that there's one test that behaves badly (a media test?), and causes the off-by-one for all the other tests run in the same shard.

Dirk Pranke

Comment 8 2012-04-09 18:41:41 PDT

(In reply to comment #7) > My guess is that there's one test that behaves badly (a media test?), and causes the off-by-one for all the other tests run in the same shard. I think Simon's hypothesis is a good one. I spot-checked a few logs (from builds 7423, 7428, and 7437 on the Lion Release bot) and they all appear to have one worker that starts failing every test starting with media/nodesFromRect-shadowContent.html, so it might be that test. If you then look at the results for 7437, http://build.webkit.org/results/Lion%20Release%20(Tests)/r113622%20(7437)/results.html , you can see that the -actual output for nodesFromRect-shadowContent is repeated in the next failing test, media/remove-from-document-before-load.html . And, interestingly, the -expected for that test says "PASS: A crash did not occur when the media element was removed before loading.". So, I'm guessing that that test is actually crashing, and we're not noticing it, and somehow the internal output buffers in NRWT aren't getting emptied, so we end up reusing the same output twice and everything goes to hell after that. Note that that is the test that is being moved in r113371, so I think we have a pretty good suspect :). Maybe try skipping that test and see if the problem goes away? There's almost certainly also a bug in NRWT that maybe I introduced a few weeks ago when I was reworking the crash log code, so I can start looking for that now that I have a theory, as well.

Beth Dakin

Comment 9 2012-04-09 19:09:00 PDT

Thanks for the analysis, Dirk! I will try skipping that test, and we'll see how it goes.

Beth Dakin

Comment 10 2012-04-09 19:14:11 PDT

Skipped with http://trac.webkit.org/changeset/113661

Antonio Gomes

Comment 11 2012-04-09 20:17:59 PDT

I would live to see a backtrace of this crash, if it is confirmed.

Antonio Gomes

Comment 12 2012-04-10 13:03:30 PDT

I wonder if the above patch fixes things. We would have to unskip the test again and wait for a few bot cycles. Anyways, the changes below work fine on Gtk and Qt builds, and should go it regardless I would say. diff --git a/LayoutTests/media/nodesFromRect-shadowContent.html b/LayoutTests/media/nodesFromRect-shadowContent.html index d55610a..9fdaf7f 100644 --- a/LayoutTests/media/nodesFromRect-shadowContent.html +++ b/LayoutTests/media/nodesFromRect-shadowContent.html @@ -13,11 +13,6 @@ <script type="text/javascript" charset="utf-8"> function runTest() { - if (window.layoutTestController) { - layoutTestController.dumpAsText(); - layoutTestController.waitUntilDone(); - } - var e = {}; // Set up shortcut access to elements @@ -42,12 +37,7 @@ var shadow =['-webkit-media-controls-timeline-container', '-webkit-media-controls-play-button', '-webkit-media-controls-panel', '-webkit-media-controls']; checkShadowContent(clickX, clickY, 10, 10, 20, 20, shadow); - - if (window.layoutTestController) - layoutTestController.notifyDone(); } - - window.onload = runTest; </script> </head> <body id="body"> @@ -58,6 +48,7 @@ testExpected("video.controls", null, '!='); waitForEvent('canplaythrough', function () { runTest(); + endTest(); } ); video.src = findMediaFile("video", "content/test"); </script> diff --git a/LayoutTests/platform/gtk/media/nodesFromRect-shadowContent-expected.txt b/LayoutTests/platform/gtk/media/nodesFromRect-shadowContent-expected.txt index 390e73e..371f393 100644 --- a/LayoutTests/platform/gtk/media/nodesFromRect-shadowContent-expected.txt +++ b/LayoutTests/platform/gtk/media/nodesFromRect-shadowContent-expected.txt @@ -4,4 +4,5 @@ This test only runs in DRT! EXPECTED (video.controls != 'null') OK EVENT(canplaythrough) +END OF TEST

Antonio Gomes

Comment 13 2012-04-11 12:43:44 PDT

Created attachment 136726 [details] (committed r113910, r=eric_carlson) speculative patch After moving the test to this new location, I accidentally left behind an unneeded line: "window.onload = runTest;". It is unneeded since now the test gets triggered by "waitForEvent('canplaythrough'" Also, remove unneeded LTC calls, since they are called by the helper scripts imported (e.g. video-test.js). Lets see if the Mac bots like this more...

Simon Fraser (smfr)

Comment 14 2012-04-11 13:01:09 PDT

As well as fixing the test, we should fix the harness to ensure that bad tests can't trip it up.

Eric Carlson

Comment 15 2012-04-11 13:47:46 PDT

Please file another bug about the problem with the test harness if one doesn't already exist.

Simon Fraser (smfr)

Comment 16 2012-04-11 13:49:30 PDT

This was that bug :(

Antonio Gomes

Comment 17 2012-04-11 13:51:21 PDT

(In reply to comment #16) > This was that bug :( Right. Want me to hold on committing the test fix?

Dirk Pranke

Comment 18 2012-04-11 14:11:46 PDT

(In reply to comment #14) > As well as fixing the test, we should fix the harness to ensure that bad tests can't trip it up. I will note that fixing things completely probably includes something like the #URL comment that chromium does ... also, we don't actually know for sure that there's a bug in NRWT at the moment; it could be a bug in DRT. But that's quibbling ...

Antonio Gomes

Comment 19 2012-04-11 14:20:35 PDT

Ok, will push and leave the bug opened. Possible useful chat: <eric_carlson> tonikitoo: the changes look OK, but do you know what was causing so many test failures <eric_carlson> tonikitoo: calling notifyDone() twice? <tonikitoo> eric_carlson, maybe that, and also calling runTests twice? <tonikitoo> from window.onload= runtest and waitTillRead(runTest, ...) <eric_carlson> tonikitoo: oh, I maybe having a test continue to generate output after it claimed to have finished... <tonikitoo> that is my guess

Dirk Pranke

Comment 20 2012-04-11 16:09:10 PDT

(In reply to comment #19) > Ok, will push and leave the bug opened. > > Possible useful chat: > > <eric_carlson> tonikitoo: the changes look OK, but do you know what was causing so many test failures > <eric_carlson> tonikitoo: calling notifyDone() twice? > <tonikitoo> eric_carlson, maybe that, and also calling runTests twice? > <tonikitoo> from window.onload= runtest and waitTillRead(runTest, ...) > <eric_carlson> tonikitoo: oh, I maybe having a test continue to generate output after it claimed to have finished... > <tonikitoo> that is my guess Wait, does this mean that calling runTest() from inside a test can make DRT generate two copies of the output? That seems bad.

Antonio Gomes

Comment 21 2012-04-11 21:28:33 PDT

seems like the patch is still failing on Mac Lion. Investigating...

Antonio Gomes

Comment 22 2012-04-11 21:48:39 PDT

(In reply to comment #21) > seems like the patch is still failing on Mac Lion. Investigating... It seems to need rebaseline: --- /Volumes/Data/slave/lion-intel-release-tests/build/layout-test-results/media/nodesFromRect-shadowContent-expected.txt +++ /Volumes/Data/slave/lion-intel-release-tests/build/layout-test-results/media/nodesFromRect-shadowContent-actual.txt @@ -3,4 +3,6 @@ This test only runs in DRT! EXPECTED (video.controls != 'null') OK +EVENT(canplaythrough) +END OF TEST http://build.webkit.org/results/Lion%20Release%20%28Tests%29/r113910%20%287504%29/media/nodesFromRect-shadowContent-diff.txt

Antonio Gomes

Comment 23 2012-04-11 22:29:26 PDT

Created attachment 136826 [details] rebaseline

WebKit Review Bot

Comment 24 2012-04-11 23:05:45 PDT

Comment on attachment 136826 [details] rebaseline Clearing flags on attachment: 136826 Committed r113944: <http://trac.webkit.org/changeset/113944>

WebKit Review Bot

Comment 25 2012-04-11 23:05:52 PDT

All reviewed patches have been landed. Closing bug.

Antonio Gomes

Comment 26 2012-04-12 05:39:02 PDT

Tim Horton

Comment 27 2012-09-06 18:32:14 PDT

Has anyone seen this recently? Should this bug be closed?

Simon Fraser (smfr)

Comment 28 2012-09-07 11:02:16 PDT

I think we're keeping this open to fix: "As well as fixing the test, we should fix the harness to ensure that bad tests can't trip it up."

Note You need to log in before you can comment on or make changes to this bug.

Status REOPENED

Resolution

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware Mac

OS Unspecified

Product WebKit

Component Tools / Tests

Assignee

Nobody

Reported

2012-04-06 16:29 PDT

Modified

2012-09-07 11:02 PDT History

CC List

10 users Show

URL

Keywords InRadar, LayoutTestFailure, MakingBotsRed, Regression

Depends on

Blocks