Bug 28845 - REGRESSION: media/video-size-intrinsic-scale.html (and other media tests?) crashing/timing-out intermittently
Summary: REGRESSION: media/video-size-intrinsic-scale.html (and other media tests?) cr...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC OS X 10.5
: P1 Normal
Assignee: Simon Fraser (smfr)
URL:
Keywords: InRadar
Depends on:
Blocks: 29035 28624 29037 38912
  Show dependency treegraph
 
Reported: 2009-08-31 03:54 PDT by Eric Seidel (no email)
Modified: 2010-05-11 09:23 PDT (History)
3 users (show)

See Also:


Attachments
crash report for media/view-src-add-src (51.30 KB, text/plain)
2009-08-31 03:54 PDT, Eric Seidel (no email)
no flags Details
same crash, seen with compositing/geometry/clipping-foreground on 8/28 (47.46 KB, text/plain)
2009-08-31 03:57 PDT, Eric Seidel (no email)
no flags Details
Skip media/video-size-intrinsic-scale.html on leopard and re-enable media/video-source-add-src.html (4.15 KB, patch)
2009-09-16 19:38 PDT, Eric Seidel (no email)
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Eric Seidel (no email) 2009-08-31 03:54:58 PDT
Created attachment 38807 [details]
crash report for media/view-src-add-src

media/video-source-add-src.html (and other media tests?) crashing intermittently 

As seen in bug 28827 (by the commit queue).

Attaching crash log.
Comment 1 Eric Seidel (no email) 2009-08-31 03:57:40 PDT
Created attachment 38808 [details]
same crash, seen with compositing/geometry/clipping-foreground on 8/28

Here is another crash report (of the same crash) seen while trying to land bug 28709:
compositing/geometry/clipping-foreground.html -> crashed
Comment 2 Eric Seidel (no email) 2009-08-31 04:00:12 PDT
The first report of this I see locally is from 8/28, so I assume it is a recent regression.
Comment 3 Eric Seidel (no email) 2009-09-01 03:37:14 PDT
Just had media/video-source-add-src.html timeout while trying to land bug 28808.  I expect it's a similar issue.
Comment 4 Eric Seidel (no email) 2009-09-01 03:39:33 PDT
The bots just saw this timeout too!
http://build.webkit.org/results/Leopard%20Intel%20Release%20(Tests)/r47923%20(4574)/results.html
Comment 5 Eric Seidel (no email) 2009-09-01 16:16:18 PDT
Hit this again when trying to land bug 28844. :(  I think we should consider disabling this test as it seems most prone to crash.
Comment 6 Simon Fraser (smfr) 2009-09-01 16:29:42 PDT
Eric is back next week.
Comment 7 Eric Seidel (no email) 2009-09-01 16:36:44 PDT
media/video-source-add-src.html -> timed out
Saw the timeout again with bug 28776.
Comment 8 Eric Seidel (no email) 2009-09-01 16:39:11 PDT
This could have the same root cause as bug 28624.
Comment 9 Eric Seidel (no email) 2009-09-01 16:49:01 PDT
media/video-source-add-src.html has timed out the last 2 runs on the bots:
http://build.webkit.org/results/Leopard%20Intel%20Release%20(Tests)/r47948%20(4600)/
http://build.webkit.org/results/Leopard%20Intel%20Release%20(Tests)/r47949%20(4601)/

Time to skip this test. :(  The revision which "started" this recent bout of timeouts is unrelated: http://trac.webkit.org/changeset/47948
Comment 10 Eric Seidel (no email) 2009-09-01 16:52:44 PDT
I will attach and then commit a patch which moves media/video-source-add-src.html to media/video-source-add-src.html-disabled to make the bots green.  Eric can re-enable this test when he returns next week. :(
Comment 11 Simon Fraser (smfr) 2009-09-01 16:54:12 PDT
Why not just add it to the skipped list?
Comment 12 Eric Seidel (no email) 2009-09-01 16:56:21 PDT
I could add it to all the skipped lists if you prefer.  That seems like a larger change for little/no? gain.  I'm comfortable with either solution.
Comment 13 Eric Seidel (no email) 2009-09-01 17:15:56 PDT
Committed r47951: <http://trac.webkit.org/changeset/47951>
Comment 14 Eric Seidel (no email) 2009-09-01 17:16:37 PDT
Came to consensus with Simon over IRC in #webkit.
Comment 15 Simon Fraser (smfr) 2009-09-01 17:18:14 PDT
<rdar://problem/7189153>
Comment 16 Eric Seidel (no email) 2009-09-02 00:47:49 PDT
I saw another example of this crash last night while trying to land bug 28225:
compositing/color-matching/image-color-matching.html -> crashed
So sadly skipping the test did not get rid of the crashing flakiness.  It might have gotten rid of the unexplained time outs, but we'll need to dig more to solve this crasher. :(
Comment 17 Simon Fraser (smfr) 2009-09-02 09:14:17 PDT
This may be a bug in a system framework on 10.5.8. Eric, you could try running 10.6 to see if the crashes happen there.
Comment 18 Eric Seidel (no email) 2009-09-02 13:51:42 PDT
Unfortunately I don't yet have Snow Leopard and it's not officially supported for Google Macs yet (we have some outstanding radar's with Apple specific to our setup).

That said, even if this is leopard specific... we may need to work around it for leopard users.  I think we need to start by one of us staring at that crash trace and theorizing what could be hosed in it.
Comment 19 Simon Fraser (smfr) 2009-09-02 16:18:23 PDT
It's something in Core Animation/AppKit teardown. I'll follow up in a week and a half when I get back from vacation.
Comment 20 Eric Seidel (no email) 2009-09-03 15:57:29 PDT
Trying to land bug 28406 was yet another victim to this regression. :(
Comment 21 Eric Seidel (no email) 2009-09-04 00:42:36 PDT
Bug 28961 was yet another victim!  Blocked by landing by a flakey media test crash. :(
Comment 22 Eric Seidel (no email) 2009-09-04 03:21:18 PDT
Likely related timeout while trying to land bug 28931. :(
Comment 23 Eric Seidel (no email) 2009-09-06 02:04:17 PDT
Saw the timeout:
media/video-source-error.html -> timed out
when landing bug 28993 as well.  Maybe the timeouts are a separate bug, but I think they're probably related to this one.
Comment 24 Eric Seidel (no email) 2009-09-08 09:10:54 PDT
I went back through the crash logs from the commit-queue for last week.  They were all media related. :)  I filed separate bugs (bug 29035 and bug 29037) for crashes which are probably caused by this one, but have different stack traces.  Hopefully between the 3 separate stack traces what is wrong will be obvious to Eric or Simon. :(
Comment 25 Eric Seidel (no email) 2009-09-09 09:55:28 PDT
video-no-autoplay just crashed on a Leopard bot.  w/o a crash log it's difficult to know if it's related or not, but given how many different tests have produced these similar crash dumps, it probably is:
http://build.webkit.org/results/Leopard%20Intel%20Release%20(Tests)/r48210%20(4835)/results.html
Comment 26 Eric Seidel (no email) 2009-09-09 16:00:01 PDT
I'm unsure what to do here.  It may be time for me to dive in an try and fix this myself.  This bug is the largest pain point with the commit-queue right now. :(
The bots just hit this again:
http://build.webkit.org/results/Leopard%20Intel%20Release%20(Tests)/r48231%20(4853)/results.html
Comment 27 Eric Seidel (no email) 2009-09-09 16:14:16 PDT
I wonder if we could catch this by running the media layout tests with NSZombies enabled.
Comment 28 Eric Seidel (no email) 2009-09-09 17:15:27 PDT
> run-webkit-tests --guard media
Testing 92 test cases.
media ............................
media/remove-from-document.html -> failed
media/video-aspect-ratio.html -> failed
media/video-layer-crash.html -> failed
media/video-load-networkState.html -> failed
media/video-loop.html -> failed
media/video-played-ranges-1.html -> failed
media/video-seek-past-end-playing.html -> failed
media/video-seeking.html -> failed
media/video-transformed.html -> failed
media/video-zoom-controls.html -> failed
media/video-zoom.html -> failed

Most of them were timeouts, but a few of them were actual failures which could be memory related.

(I removed some of the extra dots from the output)
Comment 29 Eric Seidel (no email) 2009-09-11 09:30:17 PDT
3 more rejections last night, and another crash on the bots. :(
http://build.webkit.org/results/Leopard%20Intel%20Release%20(Tests)/r48301%20(4915)/
Comment 30 Eric Seidel (no email) 2009-09-11 17:26:57 PDT
Another instance on the bots:
http://build.webkit.org/results/Leopard%20Intel%20Release%20(Tests)/r48323%20(4938)/results.html
Four more spurious rejections today, landing bug 29210, bug 29159, bug 28831, and bug 26715. :(
Comment 31 Eric Seidel (no email) 2009-09-11 17:28:21 PDT
It might be possible to write a git bisect command which ran the media layout tests 10 times and exited 0 if they all passed.  That could help us figure out which revision caused this failure.
Comment 32 Eric Seidel (no email) 2009-09-11 17:37:38 PDT
test.sh:
#!/bin/sh
build-webkit || exit 125  # ignore broken builds
for ((  i = 0 ;  i <= 10;  i++  ))
do
  build-dump-render-tree || exit 125 # ignore builds with a broken DRT build
  run-webkit-tests media || exit 1 # Mark builds as bad if the media tests fail.
done


git bisect start known_bad_revision known_good_revision
git bisect run ./test.sh

known_bad_revision would be something from August 21st or whenever we decided this first started failing.

known_good_revision would be a git commit from sometime before that.  maybe august 1st?
Comment 33 Eric Seidel (no email) 2009-09-11 18:14:14 PDT
Here is a more realistic test.sh:
#!/bin/sh
echo "Building WebKit"
WebKitTools/Scripts/build-webkit > /dev/null || exit 125  # Ignore broken builds.
for ((  i = 0 ;  i <= 10;  i++  ))
do
  WebKitTools/Scripts/build-dumprendertree > /dev/null || exit 125 # Ignore revisions with a broken DRT build.
  WebKitTools/Scripts/run-webkit-tests --quiet --no-sample-on-timeout --no-launch-safari --exit-after-n-failures=1 media || exit 1 # Mark builds as bad if the media tests fail.
done
echo "Passed"

Unfortunately that still produces false positives. :(  I'm not able to get this to fail reliably enough to test yet.
Comment 34 Eric Seidel (no email) 2009-09-11 18:18:22 PDT
I've not yet seen these crashes on the Snow Leopard bot.  However the Snow Leopard bot hasn't been around very long, and it's possible that bug 29216 is masking one of these failures.
Comment 35 Eric Seidel (no email) 2009-09-11 18:39:32 PDT
HUZZAH!  I found the culprit!

media/video-size-intrinsic-scale.html

run-webkit-tests media/video-size-intrinsic-scale.html media/video-size-intrinsic-scale.html ... (pasted 100 more times or so)

crashes reliably for me!  Generally it crashes after about the 8th run.

We may just be able to skip this test on Leopard to get rid of this bug!
Comment 36 Eric Seidel (no email) 2009-09-11 18:59:48 PDT
OK.  Time for me to retire for the weekend.  I'll look more at this on monday, including considering skipping media/video-size-intrinsic-scale.html on leopard until Eric/Simon have a chance to figure out what's actually wrong.
Comment 37 Simon Fraser (smfr) 2009-09-15 13:02:22 PDT
I ran media/video-source-add-src.html 1000 times with no crash. I did see one hang related to this bug when running the media tests 100 times. Would be nice to know how to make this more reproducible.
Comment 38 Eric Seidel (no email) 2009-09-15 13:14:53 PDT
(In reply to comment #37)
> I ran media/video-source-add-src.html 1000 times with no crash. I did see one
> hang related to this bug when running the media tests 100 times. Would be nice
> to know how to make this more reproducible.

The crash is repeatable for me running media/video-size-intrinsic-scale.html a few times. :)  add-src is not the problem test, it just happend to be the test after media/video-size-intrinsic-scale.html.
Comment 39 Simon Fraser (smfr) 2009-09-15 14:52:57 PDT
I can run media/video-size-intrinsic-scale.html lots of times with no issues.

Eric, what's your hardware?
Comment 40 Eric Seidel (no email) 2009-09-15 15:04:08 PDT
(In reply to comment #39)
> I can run media/video-size-intrinsic-scale.html lots of times with no issues.
> 
> Eric, what's your hardware?

Mac Pro (Quad).  I'll send you a system profiler report.

> WebKitTools/Scripts/run-webkit-tests --iterations 100 --no-sample-on-timeout media/video-size-intrinsic-scale.html
Testing 1 test cases 100 times.
media ............
media/video-size-intrinsic-scale.html -> timed out
....................
media/video-size-intrinsic-scale.html -> timed out
............................
media/video-size-intrinsic-scale.html -> timed out
............
media/video-size-intrinsic-scale.html -> timed out
...........................
media/video-size-intrinsic-scale.html -> timed out
.

I'm not seeing it crash today, interesting.
Comment 41 Simon Fraser (smfr) 2009-09-15 15:08:37 PDT
Interesting, I'm testing on a dual CPU iMac. I'll try on my Mac Pro.
Comment 42 Eric Seidel (no email) 2009-09-15 15:51:27 PDT
Wow, interesting.

I did a clean build of Debug (previously I was testing with release):
run-webkit-tests --iterations 100 --no-sample-on-timeout media/video-size-intrinsic-scale.html --debug

passed!

I'll try a clean build of Release to check.
Comment 43 Eric Seidel (no email) 2009-09-15 17:39:14 PDT
I did a clean Release build.  I'm able to get media/video-size-intrinsic-scale.html repeatably for release, but not for Debug builds.

I'm able to get media/video-size-intrinsic-scale.html to crash running DRT under the debugger in XCode by passing the full path to media/video-size-intrinsic-scale.html 10 times in the arguments panel for the DumpRenderTree executable in my WebCore target.
Comment 44 Eric Seidel (no email) 2009-09-15 17:48:54 PDT
It would now be possible to write a git bisect command to test if this is a regression.  That would be complicated by the fact that a bunch of flags that we would want to use in run-webkit-tests have only recently been added.  Probably would be simplest to start by checking out the revision where media/video-size-intrinsic-scale.html was added and see if the test can be made to crash there.
Comment 45 Simon Fraser (smfr) 2009-09-16 12:54:48 PDT
This turns out to be an issue in CoreVideo, tracked via <rdar://problem/7228836>. I can't see any way to work around it, other than run SnowLeopard.
Comment 46 Eric Seidel (no email) 2009-09-16 13:03:36 PDT
Do we know if this is a regression in 10.5.8?  Did we start tripping this with some certain tests?  Can we skip those tests on Leopard?

As far as I can tell this crasher is relatively new. :)  I don't know if it's caused by code changes on our part, the 10.5.8 update, or by test additions, but presumably you could tell us given your radar knowledge. :)
Comment 47 Simon Fraser (smfr) 2009-09-16 13:22:29 PDT
The bug is related to the use of CVDisplayLinks, which are used by Core Animation when hardware compositing is enabled. So it's likely that this started to show up more when we turned on hardware-compositing for <video>. From what I can determine, it's a race condition that is simply a function of the frequency with which we turn over CVDisplayLinks.

I tried changing the order of teardown of video elements vs. the WebHTMLView, and that did not avoid the bug.
Comment 48 Eric Seidel (no email) 2009-09-16 13:28:42 PDT
(In reply to comment #47)
> The bug is related to the use of CVDisplayLinks, which are used by Core
> Animation when hardware compositing is enabled. So it's likely that this
> started to show up more when we turned on hardware-compositing for <video>.
> From what I can determine, it's a race condition that is simply a function of
> the frequency with which we turn over CVDisplayLinks.
> 
> I tried changing the order of teardown of video elements vs. the WebHTMLView,
> and that did not avoid the bug.

Do you know if this bug is triggered by all video tests, or just a couple of them?  If it's just a couple of them it would be easy to skip them on Leopard.
Comment 49 Simon Fraser (smfr) 2009-09-16 13:32:36 PDT
I don't see any obvious correlation between occurrences of the bug, and what any specific test is doing (other than triggering hardware). However, I guess we could try skipping media/video-size-intrinsic-scale.html on leopard.
Comment 50 Eric Seidel (no email) 2009-09-16 19:38:24 PDT
Created attachment 39676 [details]
Skip media/video-size-intrinsic-scale.html on leopard and re-enable media/video-source-add-src.html
Comment 51 WebKit Commit Bot 2009-09-17 12:52:18 PDT
Comment on attachment 39676 [details]
Skip media/video-size-intrinsic-scale.html on leopard and re-enable media/video-source-add-src.html

Clearing flags on attachment: 39676

Committed r48485: <http://trac.webkit.org/changeset/48485>
Comment 52 WebKit Commit Bot 2009-09-17 12:52:27 PDT
All reviewed patches have been landed.  Closing bug.
Comment 53 Eric Seidel (no email) 2009-09-17 13:34:51 PDT
Hopefully these crashes will all go away now.  I'll close the rest of the dependent bugs if those stop too.
Comment 54 Eric Seidel (no email) 2009-09-22 09:57:49 PDT
Triple whammy on the bots just now. :(

http://build.webkit.org/results/Leopard%20Intel%20Release%20(Tests)/r48636%20(5229)/results.html

Tests that timed out:

compositing/color-matching/image-color-matching.html	stderr
media/video-source.html	stderr

Tests that caused the DumpRenderTree tool to crash:

media/controls-right-click-on-timebar.html	stderr

I suspect those are all related to this root cause.