Bug 72039

Summary: Tests occasionally report missing expectations
Product: WebKit Reporter: Tony Gentilcore <tonyg>
Component: Tools / TestsAssignee: Erik Arvidsson <arv>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, aboxhall, arv, dominicc, dpranke, eric, japhet, jcivelli, kbr, ojan, rafaelw, tony, tonyg, ukai
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on: 72109    
Bug Blocks:    
Attachments:
Description Flags
Patch
none
Patch
none
Patch
none
Patch none

Description Tony Gentilcore 2011-11-10 11:53:44 PST
Ojan and I both noticed this while gardening today. We have several flaky tests that typically PASS, but every so often report MISSING.

It seems like this must be a script bug.

For example:
http://test-results.appspot.com/dashboards/flakiness_dashboard.html#tests=platform%2Fchromium-cg-mac%2Fediting%2Finput%2Fime-candidate-window-position.html
Comment 1 Ojan Vafai 2011-11-10 11:59:04 PST
What I observe is that it's trying to do a pixel diff on dumpAsText tests. It obviously can't find the png and reports it as MISSING. Looking at the test, it's hard for me to see how it's possible that these tests would not hit the dumpAsText line. 

Also, notably, it's getting a JS exception saying that "debug" is not defined, which is defined in the same file as the dumpAsText call. So, sometimes that file isn't getting loaded. But the path is correct and most of the time it gets loaded fine.

In conclusion, my best guess is that Chrome/WebKit has a bug with script loading where we continue loading the page instead of blocking until the script has loaded. Not sure who the experts on html parsing / script loading are. CCing a few likely candidates. :)
Comment 2 Dirk Pranke 2011-11-10 12:03:11 PST
I have definitely seen things in the past where the test errors out before calling dumpAsText, and we get MISSING reports as a result.

Obviously one way to debug this would be to modify the test to call dumpAsText() in the main .html file and see if that makes the problem go away.
Comment 3 Tony Gentilcore 2011-11-10 12:05:01 PST
> In conclusion, my best guess is that Chrome/WebKit has a bug with script loading where we continue loading the page instead of blocking until the script has loaded.

Definitely explanative, but afaik not much has changed there lately. If we weren't blocking on script loading in certain circumstances I'd also expect that failure to manifest itself in a much louder way (ie. lots of real world bugs and other test failures).
Comment 4 Tony Gentilcore 2011-11-10 12:10:29 PST
You know, another thing is that these tests used to flakily timeout. Now they flakily go missing. No timeouts have occurred since the flaky missings started and the missings seem to be at roughly the same frequency that the timeouts were at previously.

Perhaps either we are now timing out earlier (before dumpAsText()) or timeouts before dumpAsText() are now handled differently?
Comment 5 Tony Gentilcore 2011-11-10 12:14:58 PST
(In reply to comment #4)
> You know, another thing is that these tests used to flakily timeout. Now they flakily go missing. No timeouts have occurred since the flaky missings started and the missings seem to be at roughly the same frequency that the timeouts were at previously.

The barrier between the timeouts and the missings is this relatively tight range:
http://trac.webkit.org/log/?rev=99829&stop_rev=99822&verbose=on
Comment 6 Tony Gentilcore 2011-11-10 12:19:51 PST
> The barrier between the timeouts and the missings is this relatively tight range:
> http://trac.webkit.org/log/?rev=99829&stop_rev=99822&verbose=on

https://bugs.webkit.org/show_bug.cgi?id=71859 is the only thing in that range that could be relevant (although I don't immediately see how).
Comment 7 Ojan Vafai 2011-11-10 12:23:10 PST
I think the TIMEOUT thing is a red herring. We *know* that the tests are being run as pixel tests and that the test code is getting executed before the js-test-pre script has successfully loaded.

Whether there's a timing issue or that the js-test-pre script isn't loading at all for some reason is the question.
Comment 8 Ojan Vafai 2011-11-10 15:12:23 PST
Tony points out http://test-results.appspot.com/dashboards/flakiness_dashboard.html#tests=http%2Ftests%2Fmultipart%2Fpolicy-ignore-crash.php%2Chttp%2Ftests%2Fmultipart%2Fmultipart-wait-before-boundary.html.

http/tests/multipart/multipart-wait-before-boundary.html runs right before http/tests/multipart/policy-ignore-crash.php. And http/tests/multipart/policy-ignore-crash.php only passes when http/tests/multipart/multipart-wait-before-boundary.html times out!
Comment 9 Tony Gentilcore 2011-11-11 03:24:33 PST
(In reply to comment #8)
> Tony points out http://test-results.appspot.com/dashboards/flakiness_dashboard.html#tests=http%2Ftests%2Fmultipart%2Fpolicy-ignore-crash.php%2Chttp%2Ftests%2Fmultipart%2Fmultipart-wait-before-boundary.html.
> 
> http/tests/multipart/multipart-wait-before-boundary.html runs right before http/tests/multipart/policy-ignore-crash.php. And http/tests/multipart/policy-ignore-crash.php only passes when http/tests/multipart/multipart-wait-before-boundary.html times out!

Good observation. Along the same lines, I notice that ime-candidate-window-position.html goes missing when it runs immediately after mhtml/simple_page_unmht.mht and passes when it runs after something else.

Based on what looks to me to be the regression range and the mhtml connection, I'm going to try rolling out http://trac.webkit.org/changeset/99826 . If a "missing" happens after that, I'll roll it forward again right away.
Comment 10 Tony Gentilcore 2011-11-11 06:22:43 PST
Looks like after the rollout the flaky timeouts of before are back and the missings are gone.

Closing this bug.
Comment 11 Tony Gentilcore 2011-11-11 08:02:25 PST
Looks like this still happens on fast/forms/form-associated-element-crash3.html and it isn't strongly tied to which test runs immediately before it (usually is fast/forms/form-associated-element-crash2.html).

http://test-results.appspot.com/dashboards/flakiness_dashboard.html#tests=fast%2Fforms%2Fform-associated-element-crash3.html

It seems to have been happening there for a long time and it looks independent of the problem that was on ime-candidate-window-position.
Comment 12 Jay Civelli 2011-11-11 10:06:49 PST
So what is the conclusion.
Was my CL responsible?
Comment 13 Tony Gentilcore 2011-11-14 04:36:48 PST
(In reply to comment #12)
> So what is the conclusion.
> Was my CL responsible?

Rolling out that patch fixed this issue for ime-candidate-window-position.html. You can see the "missings" are all exactly correlated with landing and rolling out the patch here:
http://test-results.appspot.com/dashboards/flakiness_dashboard.html#tests=platform%2Fchromium-cg-mac%2Fediting%2Finput%2Fime-candidate-window-position.html

This is still occurring on form-associated-element-crash3.html, but it occurred there prior to your patch.
http://test-results.appspot.com/dashboards/flakiness_dashboard.html#tests=fast%2Fforms%2Fform-associated-element-crash3.html

I suspect there is a different underlying issue there, but this needs investigation.
Comment 14 Tony Gentilcore 2011-11-14 04:44:19 PST
Created attachment 114919 [details]
Patch
Comment 15 Tony Gentilcore 2011-11-14 04:44:59 PST
Committed r100128: <http://trac.webkit.org/changeset/100128>
Comment 16 Tony Gentilcore 2011-11-14 08:38:10 PST
Created attachment 114953 [details]
Patch
Comment 17 Tony Gentilcore 2011-11-14 08:41:18 PST
Committed r100147: <http://trac.webkit.org/changeset/100147>
Comment 18 Tony Gentilcore 2011-11-14 08:43:39 PST
Same symptoms on fast/forms/form-associated-element-crash.html and fast/forms/form-attribute-elements-order2.html.
http://test-results.appspot.com/dashboards/flakiness_dashboard.html#tests=fast%2Fforms%2Fform-associated-element-crash.html%2Cfast%2Fforms%2Fform-attribute-elements-order2.html

First occurrence there was before r99864.
Comment 19 Tony Gentilcore 2011-11-14 08:58:31 PST
Created attachment 114958 [details]
Patch
Comment 20 Tony Gentilcore 2011-11-14 08:59:32 PST
Committed r100151: <http://trac.webkit.org/changeset/100151>
Comment 22 Erik Arvidsson 2011-11-14 09:53:19 PST
I think I have a good idea what is causing this.

My test refactoring removes the <link rel=stylesheet> from the html files. Instead it injects it using js. However, when the link is inserted using js it might not be ready when the first layout happens. The css file has some white-space: pre so that would explain why we are seeing missing newlines and other white spaces.
Comment 23 Tony Gentilcore 2011-11-15 06:48:55 PST
3 more with the same symptoms:
fast/forms/form-attribute-elements.html
fast/forms/form-attribute-nonexistence-form-id.html
fast/forms/form-attribute-elements-order.html
Comment 24 Tony Gentilcore 2011-11-15 06:51:57 PST
Created attachment 115158 [details]
Patch
Comment 25 Tony Gentilcore 2011-11-15 06:58:31 PST
Committed r100278: <http://trac.webkit.org/changeset/100278>
Comment 26 Fumitoshi Ukai 2012-03-12 01:31:46 PDT
I see this again.

http://test-results.appspot.com/dashboards/flakiness_dashboard.html#group=%40DEPS%20-%20chromium.org&tests=fast%2Fforms%2Fform-associated-element-removal.html

fast/forms/form-associated-element-removal.html loads js-test-pre.js (so requests dumpAsText), but it emits render tree and png, so failed as MISSING.
Comment 27 Ojan Vafai 2012-05-29 14:06:49 PDT
*** Bug 87776 has been marked as a duplicate of this bug. ***
Comment 28 Ojan Vafai 2012-06-12 12:58:37 PDT
An even simpler example of a test that hits this error:
http://test-results.appspot.com/dashboards/flakiness_dashboard.html#showExpectations=true&tests=http%2Ftests%2Floading%2Fempty-subframe.html

The test page's contents are:
<script>
if (window.layoutTestController)
    layoutTestController.dumpAsText();
</script>
This is a test of load callbacks. It is only useful inside the regression test tool.<br>
<iframe name="f1"></iframe>

On the chromium mac bots it sometimes runs that test as a pixel test.

Up to now, I've assumed the bug was the dumpAsText isn't getting called. I wonder if it's actually that we're overwriting the state in DRT somehow (e.g. when running tests in parallel).
Comment 29 Dirk Pranke 2012-06-12 13:29:31 PDT
(In reply to comment #28)
> An even simpler example of a test that hits this error:
> http://test-results.appspot.com/dashboards/flakiness_dashboard.html#showExpectations=true&tests=http%2Ftests%2Floading%2Fempty-subframe.html
> 
> The test page's contents are:
> <script>
> if (window.layoutTestController)
>     layoutTestController.dumpAsText();
> </script>
> This is a test of load callbacks. It is only useful inside the regression test tool.<br>
> <iframe name="f1"></iframe>
> 
> On the chromium mac bots it sometimes runs that test as a pixel test.
> 
> Up to now, I've assumed the bug was the dumpAsText isn't getting called. I wonder if it's actually that we're overwriting the state in DRT somehow (e.g. when running tests in parallel).

Interesting. It seems like it would be hard for that not to be called.

I wonder if we see this issue across all the ports? I suppose on non-pixel-testing ports we'd end up seeing this as a flaky text diff instead of a MISSING.
Comment 30 Dirk Pranke 2012-06-12 13:30:41 PDT
(In reply to comment #29)
> (In reply to comment #28)
> > An even simpler example of a test that hits this error:
> > http://test-results.appspot.com/dashboards/flakiness_dashboard.html#showExpectations=true&tests=http%2Ftests%2Floading%2Fempty-subframe.html
> > 
> > The test page's contents are:
> > <script>
> > if (window.layoutTestController)
> >     layoutTestController.dumpAsText();
> > </script>
> > This is a test of load callbacks. It is only useful inside the regression test tool.<br>
> > <iframe name="f1"></iframe>
> > 
> > On the chromium mac bots it sometimes runs that test as a pixel test.
> > 
> > Up to now, I've assumed the bug was the dumpAsText isn't getting called. I wonder if it's actually that we're overwriting the state in DRT somehow (e.g. when running tests in parallel).
> 
> Interesting. It seems like it would be hard for that not to be called.
> 
> I wonder if we see this issue across all the ports? I suppose on non-pixel-testing ports we'd end up seeing this as a flaky text diff instead of a MISSING.

This also makes me think that one thing we could do would be, if we were to get a missing pixel result, go back and look at the text we got and see if it looks like we got an actual render tree but the expected result wasn't a render tree, and treat that as a TEXT+IMAGE instead.
Comment 31 Ojan Vafai 2012-08-20 17:00:22 PDT
*** Bug 94532 has been marked as a duplicate of this bug. ***
Comment 32 Alice Boxhall 2012-11-02 16:23:29 PDT
This is now happening on XP as well:
http://test-results.appspot.com/dashboards/flakiness_dashboard.html#tests=fast%2Fforms%2Fformaction-attribute.html

I'm going to make the test expectation general, assuming it'll start happening on other platforms as well.
Comment 33 Alice Boxhall 2012-11-02 16:24:37 PDT
(In reply to comment #32)
> This is now happening on XP as well:
> http://test-results.appspot.com/dashboards/flakiness_dashboard.html#tests=fast%2Fforms%2Fformaction-attribute.html
> 
> I'm going to make the test expectation general, assuming it'll start happening on other platforms as well.

* for fast/forms/formaction-attribute.html only.
Comment 34 Eric Seidel (no email) 2013-07-24 16:29:51 PDT
I don't think we've seen these failures since 2012.