Bug 83411 - All Mac bots occasionally fail tons of media/ sputnik/ and svg/ tests
Summary: All Mac bots occasionally fail tons of media/ sputnik/ and svg/ tests
Status: REOPENED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: 528+ (Nightly build)
Hardware: Mac Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords: InRadar, LayoutTestFailure, MakingBotsRed, Regression
Depends on:
Blocks:
 
Reported: 2012-04-06 16:29 PDT by Beth Dakin
Modified: 2012-09-07 11:02 PDT (History)
10 users (show)

See Also:


Attachments
(committed r113910, r=eric_carlson) speculative patch (3.78 KB, patch)
2012-04-11 12:43 PDT, Antonio Gomes
no flags Details | Formatted Diff | Diff
rebaseline (1.31 KB, patch)
2012-04-11 22:29 PDT, Antonio Gomes
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Beth Dakin 2012-04-06 16:29:38 PDT
Starting around the middle of the day on April 5th, the Mac bots started occasionally failing tons of tests, primarily in media/ sputnik/ and svg/. There will be a number of good runs with the expected number of failures right now (0-10 on Lion bots), and then one or two runs with 500+ failures.
Comment 1 Radar WebKit Bug Importer 2012-04-06 16:30:14 PDT
<rdar://problem/11204404>
Comment 2 Beth Dakin 2012-04-06 16:51:25 PDT
It seems like generated results are being compared to the wrong expected results. Somehow, we're getting out of kilter.
Comment 3 Dirk Pranke 2012-04-06 18:10:36 PDT
Are there example logs for this I can look at? ... Chromium DRT will actually echo the URL for the test back into the output, so there's code in new-run-webkit-tests that actually asserts if the output is for the wrong test. It doesn't look like there's anything equivalent for the non-Chromium ports, so debugging this might be harder than I would like :(.
Comment 4 Beth Dakin 2012-04-09 11:49:54 PDT
Has anything changed in the test harness in the last week that could lead to this problem? This is a really terrible issue, and I agree it's hard to debug, so maybe we can just figure out what caused it to start happening.

I don't know how long these links will be good for, but here's an example:

This test is failing: http://trac.webkit.org/export/113594/trunk/LayoutTests/media/unsupported-rtsp.html

Here's the pretty diff: http://build.webkit.org/results/Lion%20Release%20(Tests)/r113594%20(7423)/media/unsupported-rtsp-pretty-diff.html

The pretty diff is comparing the results of this test: http://trac.webkit.org/export/113594/trunk/LayoutTests/media/svg-as-image-with-media-blocked.html with the expected results of media/unsupported-rtsp.html, and it only thinks that media/unsupported-rtsp.html is failing because the generated results are from media/svg-as-image-with-media-blocked.html
Comment 5 Simon Fraser (smfr) 2012-04-09 11:53:04 PDT
On the Lion Release Tests bot, this started between 113371 and 113376.
<http://build.webkit.org/builders/Lion%20Release%20%28Tests%29?numbuilds=200>

On Snow Leopard Release Tests, this started between 113363 and 113371.
<http://build.webkit.org/builders/SnowLeopard%20Intel%20Release%20%28Tests%29?numbuilds=200>
Comment 6 Dirk Pranke 2012-04-09 17:31:12 PDT
The only thing that's changed in the harness in that timeframe that looks even remotely related is the change to run *fewer* workers on the bots:

https://bugs.webkit.org/show_bug.cgi?id=83076

It's hard to imagine that that would be a problem, though. I haven't gotten to take much of a look at this yet, will keep investigating.
Comment 7 Simon Fraser (smfr) 2012-04-09 17:41:28 PDT
My guess is that there's one test that behaves badly (a media test?), and causes the off-by-one for all the other tests run in the same shard.
Comment 8 Dirk Pranke 2012-04-09 18:41:41 PDT
(In reply to comment #7)
> My guess is that there's one test that behaves badly (a media test?), and causes the off-by-one for all the other tests run in the same shard.

I think Simon's hypothesis is a good one. I spot-checked a few logs (from builds 7423, 7428, and 7437 on the Lion Release bot) and they all appear to have one worker that starts failing every test starting with media/nodesFromRect-shadowContent.html, so it might be that test.

If you then look at the results for 7437, http://build.webkit.org/results/Lion%20Release%20(Tests)/r113622%20(7437)/results.html , you can see that the -actual output for nodesFromRect-shadowContent is repeated in the next failing test, media/remove-from-document-before-load.html . And, interestingly, the -expected for that test says "PASS: A crash did not occur when the media element was removed before loading.".

So, I'm guessing that that test is actually crashing, and we're not noticing it, and somehow the internal output buffers in NRWT aren't getting emptied, so we end up reusing the same output twice and everything goes to hell after that.

Note that that is the test that is being moved in r113371, so I think we have a pretty good suspect :).

Maybe try skipping that test and see if the problem goes away? 

There's almost certainly also a bug in NRWT that maybe I introduced a few weeks ago when I was reworking the crash log code, so I can start looking for that now that I have a theory, as well.
Comment 9 Beth Dakin 2012-04-09 19:09:00 PDT
Thanks for the analysis, Dirk! I will try skipping that test, and we'll see how it goes.
Comment 10 Beth Dakin 2012-04-09 19:14:11 PDT
Skipped with http://trac.webkit.org/changeset/113661
Comment 11 Antonio Gomes 2012-04-09 20:17:59 PDT
I would live to see a backtrace of this crash, if it is confirmed.
Comment 12 Antonio Gomes 2012-04-10 13:03:30 PDT
I wonder if the above patch fixes things. We would have to unskip the test again and wait for a few bot cycles.

Anyways, the changes below work fine on Gtk and Qt builds, and should go it regardless I would say.

diff --git a/LayoutTests/media/nodesFromRect-shadowContent.html b/LayoutTests/media/nodesFromRect-shadowContent.html
index d55610a..9fdaf7f 100644
--- a/LayoutTests/media/nodesFromRect-shadowContent.html
+++ b/LayoutTests/media/nodesFromRect-shadowContent.html
@@ -13,11 +13,6 @@
   <script type="text/javascript" charset="utf-8">
     function runTest()
     {
-      if (window.layoutTestController) {
-        layoutTestController.dumpAsText();
-        layoutTestController.waitUntilDone();
-      }
-
       var e = {};
 
       // Set up shortcut access to elements
@@ -42,12 +37,7 @@
 
       var shadow =['-webkit-media-controls-timeline-container', '-webkit-media-controls-play-button', '-webkit-media-controls-panel', '-webkit-media-controls'];
       checkShadowContent(clickX, clickY, 10, 10, 20, 20, shadow);
-
-      if (window.layoutTestController)
-        layoutTestController.notifyDone();
     }
-
-    window.onload = runTest;
   </script>
 </head>
 <body id="body">
@@ -58,6 +48,7 @@
         testExpected("video.controls", null, '!=');
         waitForEvent('canplaythrough', function () {
             runTest();
+            endTest();
         } );
         video.src = findMediaFile("video", "content/test");
     </script>
diff --git a/LayoutTests/platform/gtk/media/nodesFromRect-shadowContent-expected.txt b/LayoutTests/platform/gtk/media/nodesFromRect-shadowContent-expected.txt
index 390e73e..371f393 100644
--- a/LayoutTests/platform/gtk/media/nodesFromRect-shadowContent-expected.txt
+++ b/LayoutTests/platform/gtk/media/nodesFromRect-shadowContent-expected.txt
@@ -4,4 +4,5 @@ This test only runs in DRT!
 
 EXPECTED (video.controls != 'null') OK
 EVENT(canplaythrough)
+END OF TEST
Comment 13 Antonio Gomes 2012-04-11 12:43:44 PDT
Created attachment 136726 [details]
(committed r113910, r=eric_carlson) speculative patch

After moving the test to this new location, I accidentally left behind an unneeded line:

"window.onload = runTest;".

It is unneeded  since now the test gets triggered by "waitForEvent('canplaythrough'"

Also, remove unneeded LTC calls, since they are called by the helper scripts imported (e.g. video-test.js).

Lets see if the Mac bots like this more...
Comment 14 Simon Fraser (smfr) 2012-04-11 13:01:09 PDT
As well as fixing the test, we should fix the harness to ensure that bad tests can't trip it up.
Comment 15 Eric Carlson 2012-04-11 13:47:46 PDT
Please file another bug about the problem with the test harness if one doesn't already exist.
Comment 16 Simon Fraser (smfr) 2012-04-11 13:49:30 PDT
This was that bug :(
Comment 17 Antonio Gomes 2012-04-11 13:51:21 PDT
(In reply to comment #16)
> This was that bug :(

Right. Want me to hold on committing the test fix?
Comment 18 Dirk Pranke 2012-04-11 14:11:46 PDT
(In reply to comment #14)
> As well as fixing the test, we should fix the harness to ensure that bad tests can't trip it up.

I will note that fixing things completely probably includes something like the #URL comment that chromium does ... also, we don't actually know for sure that there's a bug in NRWT at the moment; it could be a bug in DRT. But that's quibbling ...
Comment 19 Antonio Gomes 2012-04-11 14:20:35 PDT
Ok, will push and leave the bug opened.

Possible useful chat:

<eric_carlson> tonikitoo: the changes look OK, but do you know what was causing so many test failures
<eric_carlson> tonikitoo: calling notifyDone() twice?
<tonikitoo> eric_carlson, maybe that, and also calling runTests twice?
<tonikitoo> from window.onload= runtest and waitTillRead(runTest, ...)
<eric_carlson> tonikitoo: oh, I maybe having a test continue to generate output after it claimed to have finished...
<tonikitoo> that is my guess
Comment 20 Dirk Pranke 2012-04-11 16:09:10 PDT
(In reply to comment #19)
> Ok, will push and leave the bug opened.
> 
> Possible useful chat:
> 
> <eric_carlson> tonikitoo: the changes look OK, but do you know what was causing so many test failures
> <eric_carlson> tonikitoo: calling notifyDone() twice?
> <tonikitoo> eric_carlson, maybe that, and also calling runTests twice?
> <tonikitoo> from window.onload= runtest and waitTillRead(runTest, ...)
> <eric_carlson> tonikitoo: oh, I maybe having a test continue to generate output after it claimed to have finished...
> <tonikitoo> that is my guess

Wait, does this mean that calling runTest() from inside a test can make DRT generate two copies of the output? That seems bad.
Comment 21 Antonio Gomes 2012-04-11 21:28:33 PDT
seems like the patch is still failing on Mac Lion. Investigating...
Comment 22 Antonio Gomes 2012-04-11 21:48:39 PDT
(In reply to comment #21)
> seems like the patch is still failing on Mac Lion. Investigating...

It seems to need rebaseline:

--- /Volumes/Data/slave/lion-intel-release-tests/build/layout-test-results/media/nodesFromRect-shadowContent-expected.txt
+++ /Volumes/Data/slave/lion-intel-release-tests/build/layout-test-results/media/nodesFromRect-shadowContent-actual.txt
@@ -3,4 +3,6 @@
 This test only runs in DRT!
 
 EXPECTED (video.controls != 'null') OK
+EVENT(canplaythrough)
+END OF TEST
 
http://build.webkit.org/results/Lion%20Release%20%28Tests%29/r113910%20%287504%29/media/nodesFromRect-shadowContent-diff.txt
Comment 23 Antonio Gomes 2012-04-11 22:29:26 PDT
Created attachment 136826 [details]
rebaseline
Comment 24 WebKit Review Bot 2012-04-11 23:05:45 PDT
Comment on attachment 136826 [details]
rebaseline

Clearing flags on attachment: 136826

Committed r113944: <http://trac.webkit.org/changeset/113944>
Comment 25 WebKit Review Bot 2012-04-11 23:05:52 PDT
All reviewed patches have been landed.  Closing bug.
Comment 26 Antonio Gomes 2012-04-12 05:39:02 PDT
.
Comment 27 Tim Horton 2012-09-06 18:32:14 PDT
Has anyone seen this recently? Should this bug be closed?
Comment 28 Simon Fraser (smfr) 2012-09-07 11:02:16 PDT
I think we're keeping this open to fix:
"As well as fixing the test, we should fix the harness to ensure that bad tests can't trip it up."