Bug 72039 - Tests occasionally report missing expectations
: Tests occasionally report missing expectations
Status: RESOLVED FIXED
: WebKit
Tools / Tests
: 528+ (Nightly build)
: Unspecified Unspecified
: P2 Normal
Assigned To:
:
:
: 72109
:
  Show dependency treegraph
 
Reported: 2011-11-10 11:53 PST by
Modified: 2013-07-24 16:29 PST (History)


Attachments
Patch (1.44 KB, patch)
2011-11-14 04:44 PST, Tony Gentilcore
no flags Review Patch | Details | Formatted Diff | Diff
Patch (1.54 KB, patch)
2011-11-14 08:38 PST, Tony Gentilcore
no flags Review Patch | Details | Formatted Diff | Diff
Patch (1.55 KB, patch)
2011-11-14 08:58 PST, Tony Gentilcore
no flags Review Patch | Details | Formatted Diff | Diff
Patch (2.03 KB, patch)
2011-11-15 06:51 PST, Tony Gentilcore
no flags Review Patch | Details | Formatted Diff | Diff


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2011-11-10 11:53:44 PST
Ojan and I both noticed this while gardening today. We have several flaky tests that typically PASS, but every so often report MISSING.

It seems like this must be a script bug.

For example:
http://test-results.appspot.com/dashboards/flakiness_dashboard.html#tests=platform%2Fchromium-cg-mac%2Fediting%2Finput%2Fime-candidate-window-position.html
------- Comment #1 From 2011-11-10 11:59:04 PST -------
What I observe is that it's trying to do a pixel diff on dumpAsText tests. It obviously can't find the png and reports it as MISSING. Looking at the test, it's hard for me to see how it's possible that these tests would not hit the dumpAsText line. 

Also, notably, it's getting a JS exception saying that "debug" is not defined, which is defined in the same file as the dumpAsText call. So, sometimes that file isn't getting loaded. But the path is correct and most of the time it gets loaded fine.

In conclusion, my best guess is that Chrome/WebKit has a bug with script loading where we continue loading the page instead of blocking until the script has loaded. Not sure who the experts on html parsing / script loading are. CCing a few likely candidates. :)
------- Comment #2 From 2011-11-10 12:03:11 PST -------
I have definitely seen things in the past where the test errors out before calling dumpAsText, and we get MISSING reports as a result.

Obviously one way to debug this would be to modify the test to call dumpAsText() in the main .html file and see if that makes the problem go away.
------- Comment #3 From 2011-11-10 12:05:01 PST -------
> In conclusion, my best guess is that Chrome/WebKit has a bug with script loading where we continue loading the page instead of blocking until the script has loaded.

Definitely explanative, but afaik not much has changed there lately. If we weren't blocking on script loading in certain circumstances I'd also expect that failure to manifest itself in a much louder way (ie. lots of real world bugs and other test failures).
------- Comment #4 From 2011-11-10 12:10:29 PST -------
You know, another thing is that these tests used to flakily timeout. Now they flakily go missing. No timeouts have occurred since the flaky missings started and the missings seem to be at roughly the same frequency that the timeouts were at previously.

Perhaps either we are now timing out earlier (before dumpAsText()) or timeouts before dumpAsText() are now handled differently?
------- Comment #5 From 2011-11-10 12:14:58 PST -------
(In reply to comment #4)
> You know, another thing is that these tests used to flakily timeout. Now they flakily go missing. No timeouts have occurred since the flaky missings started and the missings seem to be at roughly the same frequency that the timeouts were at previously.

The barrier between the timeouts and the missings is this relatively tight range:
http://trac.webkit.org/log/?rev=99829&stop_rev=99822&verbose=on
------- Comment #6 From 2011-11-10 12:19:51 PST -------
> The barrier between the timeouts and the missings is this relatively tight range:
> http://trac.webkit.org/log/?rev=99829&stop_rev=99822&verbose=on

https://bugs.webkit.org/show_bug.cgi?id=71859 is the only thing in that range that could be relevant (although I don't immediately see how).
------- Comment #7 From 2011-11-10 12:23:10 PST -------
I think the TIMEOUT thing is a red herring. We *know* that the tests are being run as pixel tests and that the test code is getting executed before the js-test-pre script has successfully loaded.

Whether there's a timing issue or that the js-test-pre script isn't loading at all for some reason is the question.
------- Comment #8 From 2011-11-10 15:12:23 PST -------
Tony points out http://test-results.appspot.com/dashboards/flakiness_dashboard.html#tests=http%2Ftests%2Fmultipart%2Fpolicy-ignore-crash.php%2Chttp%2Ftests%2Fmultipart%2Fmultipart-wait-before-boundary.html.

http/tests/multipart/multipart-wait-before-boundary.html runs right before http/tests/multipart/policy-ignore-crash.php. And http/tests/multipart/policy-ignore-crash.php only passes when http/tests/multipart/multipart-wait-before-boundary.html times out!
------- Comment #9 From 2011-11-11 03:24:33 PST -------
(In reply to comment #8)
> Tony points out http://test-results.appspot.com/dashboards/flakiness_dashboard.html#tests=http%2Ftests%2Fmultipart%2Fpolicy-ignore-crash.php%2Chttp%2Ftests%2Fmultipart%2Fmultipart-wait-before-boundary.html.
> 
> http/tests/multipart/multipart-wait-before-boundary.html runs right before http/tests/multipart/policy-ignore-crash.php. And http/tests/multipart/policy-ignore-crash.php only passes when http/tests/multipart/multipart-wait-before-boundary.html times out!

Good observation. Along the same lines, I notice that ime-candidate-window-position.html goes missing when it runs immediately after mhtml/simple_page_unmht.mht and passes when it runs after something else.

Based on what looks to me to be the regression range and the mhtml connection, I'm going to try rolling out http://trac.webkit.org/changeset/99826 . If a "missing" happens after that, I'll roll it forward again right away.
------- Comment #10 From 2011-11-11 06:22:43 PST -------
Looks like after the rollout the flaky timeouts of before are back and the missings are gone.

Closing this bug.
------- Comment #11 From 2011-11-11 08:02:25 PST -------
Looks like this still happens on fast/forms/form-associated-element-crash3.html and it isn't strongly tied to which test runs immediately before it (usually is fast/forms/form-associated-element-crash2.html).

http://test-results.appspot.com/dashboards/flakiness_dashboard.html#tests=fast%2Fforms%2Fform-associated-element-crash3.html

It seems to have been happening there for a long time and it looks independent of the problem that was on ime-candidate-window-position.
------- Comment #12 From 2011-11-11 10:06:49 PST -------
So what is the conclusion.
Was my CL responsible?
------- Comment #13 From 2011-11-14 04:36:48 PST -------
(In reply to comment #12)
> So what is the conclusion.
> Was my CL responsible?

Rolling out that patch fixed this issue for ime-candidate-window-position.html. You can see the "missings" are all exactly correlated with landing and rolling out the patch here:
http://test-results.appspot.com/dashboards/flakiness_dashboard.html#tests=platform%2Fchromium-cg-mac%2Fediting%2Finput%2Fime-candidate-window-position.html

This is still occurring on form-associated-element-crash3.html, but it occurred there prior to your patch.
http://test-results.appspot.com/dashboards/flakiness_dashboard.html#tests=fast%2Fforms%2Fform-associated-element-crash3.html

I suspect there is a different underlying issue there, but this needs investigation.
------- Comment #14 From 2011-11-14 04:44:19 PST -------
Created an attachment (id=114919) [details]
Patch
------- Comment #15 From 2011-11-14 04:44:59 PST -------
Committed r100128: <http://trac.webkit.org/changeset/100128>
------- Comment #16 From 2011-11-14 08:38:10 PST -------
Created an attachment (id=114953) [details]
Patch
------- Comment #17 From 2011-11-14 08:41:18 PST -------
Committed r100147: <http://trac.webkit.org/changeset/100147>
------- Comment #18 From 2011-11-14 08:43:39 PST -------
Same symptoms on fast/forms/form-associated-element-crash.html and fast/forms/form-attribute-elements-order2.html.
http://test-results.appspot.com/dashboards/flakiness_dashboard.html#tests=fast%2Fforms%2Fform-associated-element-crash.html%2Cfast%2Fforms%2Fform-attribute-elements-order2.html

First occurrence there was before r99864.
------- Comment #19 From 2011-11-14 08:58:31 PST -------
Created an attachment (id=114958) [details]
Patch
------- Comment #20 From 2011-11-14 08:59:32 PST -------
Committed r100151: <http://trac.webkit.org/changeset/100151>
------- Comment #22 From 2011-11-14 09:53:19 PST -------
I think I have a good idea what is causing this.

My test refactoring removes the <link rel=stylesheet> from the html files. Instead it injects it using js. However, when the link is inserted using js it might not be ready when the first layout happens. The css file has some white-space: pre so that would explain why we are seeing missing newlines and other white spaces.
------- Comment #23 From 2011-11-15 06:48:55 PST -------
3 more with the same symptoms:
fast/forms/form-attribute-elements.html
fast/forms/form-attribute-nonexistence-form-id.html
fast/forms/form-attribute-elements-order.html
------- Comment #24 From 2011-11-15 06:51:57 PST -------
Created an attachment (id=115158) [details]
Patch
------- Comment #25 From 2011-11-15 06:58:31 PST -------
Committed r100278: <http://trac.webkit.org/changeset/100278>
------- Comment #26 From 2012-03-12 01:31:46 PST -------
I see this again.

http://test-results.appspot.com/dashboards/flakiness_dashboard.html#group=%40DEPS%20-%20chromium.org&tests=fast%2Fforms%2Fform-associated-element-removal.html

fast/forms/form-associated-element-removal.html loads js-test-pre.js (so requests dumpAsText), but it emits render tree and png, so failed as MISSING.
------- Comment #27 From 2012-05-29 14:06:49 PST -------
*** Bug 87776 has been marked as a duplicate of this bug. ***
------- Comment #28 From 2012-06-12 12:58:37 PST -------
An even simpler example of a test that hits this error:
http://test-results.appspot.com/dashboards/flakiness_dashboard.html#showExpectations=true&tests=http%2Ftests%2Floading%2Fempty-subframe.html

The test page's contents are:
<script>
if (window.layoutTestController)
    layoutTestController.dumpAsText();
</script>
This is a test of load callbacks. It is only useful inside the regression test tool.<br>
<iframe name="f1"></iframe>

On the chromium mac bots it sometimes runs that test as a pixel test.

Up to now, I've assumed the bug was the dumpAsText isn't getting called. I wonder if it's actually that we're overwriting the state in DRT somehow (e.g. when running tests in parallel).
------- Comment #29 From 2012-06-12 13:29:31 PST -------
(In reply to comment #28)
> An even simpler example of a test that hits this error:
> http://test-results.appspot.com/dashboards/flakiness_dashboard.html#showExpectations=true&tests=http%2Ftests%2Floading%2Fempty-subframe.html
> 
> The test page's contents are:
> <script>
> if (window.layoutTestController)
>     layoutTestController.dumpAsText();
> </script>
> This is a test of load callbacks. It is only useful inside the regression test tool.<br>
> <iframe name="f1"></iframe>
> 
> On the chromium mac bots it sometimes runs that test as a pixel test.
> 
> Up to now, I've assumed the bug was the dumpAsText isn't getting called. I wonder if it's actually that we're overwriting the state in DRT somehow (e.g. when running tests in parallel).

Interesting. It seems like it would be hard for that not to be called.

I wonder if we see this issue across all the ports? I suppose on non-pixel-testing ports we'd end up seeing this as a flaky text diff instead of a MISSING.
------- Comment #30 From 2012-06-12 13:30:41 PST -------
(In reply to comment #29)
> (In reply to comment #28)
> > An even simpler example of a test that hits this error:
> > http://test-results.appspot.com/dashboards/flakiness_dashboard.html#showExpectations=true&tests=http%2Ftests%2Floading%2Fempty-subframe.html
> > 
> > The test page's contents are:
> > <script>
> > if (window.layoutTestController)
> >     layoutTestController.dumpAsText();
> > </script>
> > This is a test of load callbacks. It is only useful inside the regression test tool.<br>
> > <iframe name="f1"></iframe>
> > 
> > On the chromium mac bots it sometimes runs that test as a pixel test.
> > 
> > Up to now, I've assumed the bug was the dumpAsText isn't getting called. I wonder if it's actually that we're overwriting the state in DRT somehow (e.g. when running tests in parallel).
> 
> Interesting. It seems like it would be hard for that not to be called.
> 
> I wonder if we see this issue across all the ports? I suppose on non-pixel-testing ports we'd end up seeing this as a flaky text diff instead of a MISSING.

This also makes me think that one thing we could do would be, if we were to get a missing pixel result, go back and look at the text we got and see if it looks like we got an actual render tree but the expected result wasn't a render tree, and treat that as a TEXT+IMAGE instead.
------- Comment #31 From 2012-08-20 17:00:22 PST -------
*** Bug 94532 has been marked as a duplicate of this bug. ***
------- Comment #32 From 2012-11-02 16:23:29 PST -------
This is now happening on XP as well:
http://test-results.appspot.com/dashboards/flakiness_dashboard.html#tests=fast%2Fforms%2Fformaction-attribute.html

I'm going to make the test expectation general, assuming it'll start happening on other platforms as well.
------- Comment #33 From 2012-11-02 16:24:37 PST -------
(In reply to comment #32)
> This is now happening on XP as well:
> http://test-results.appspot.com/dashboards/flakiness_dashboard.html#tests=fast%2Fforms%2Fformaction-attribute.html
> 
> I'm going to make the test expectation general, assuming it'll start happening on other platforms as well.

* for fast/forms/formaction-attribute.html only.
------- Comment #34 From 2013-07-24 16:29:51 PST -------
I don't think we've seen these failures since 2012.