Ojan and I both noticed this while gardening today. We have several flaky tests that typically PASS, but every so often report MISSING. It seems like this must be a script bug. For example: http://test-results.appspot.com/dashboards/flakiness_dashboard.html#tests=platform%2Fchromium-cg-mac%2Fediting%2Finput%2Fime-candidate-window-position.html
What I observe is that it's trying to do a pixel diff on dumpAsText tests. It obviously can't find the png and reports it as MISSING. Looking at the test, it's hard for me to see how it's possible that these tests would not hit the dumpAsText line. Also, notably, it's getting a JS exception saying that "debug" is not defined, which is defined in the same file as the dumpAsText call. So, sometimes that file isn't getting loaded. But the path is correct and most of the time it gets loaded fine. In conclusion, my best guess is that Chrome/WebKit has a bug with script loading where we continue loading the page instead of blocking until the script has loaded. Not sure who the experts on html parsing / script loading are. CCing a few likely candidates. :)
I have definitely seen things in the past where the test errors out before calling dumpAsText, and we get MISSING reports as a result. Obviously one way to debug this would be to modify the test to call dumpAsText() in the main .html file and see if that makes the problem go away.
> In conclusion, my best guess is that Chrome/WebKit has a bug with script loading where we continue loading the page instead of blocking until the script has loaded. Definitely explanative, but afaik not much has changed there lately. If we weren't blocking on script loading in certain circumstances I'd also expect that failure to manifest itself in a much louder way (ie. lots of real world bugs and other test failures).
You know, another thing is that these tests used to flakily timeout. Now they flakily go missing. No timeouts have occurred since the flaky missings started and the missings seem to be at roughly the same frequency that the timeouts were at previously. Perhaps either we are now timing out earlier (before dumpAsText()) or timeouts before dumpAsText() are now handled differently?
(In reply to comment #4) > You know, another thing is that these tests used to flakily timeout. Now they flakily go missing. No timeouts have occurred since the flaky missings started and the missings seem to be at roughly the same frequency that the timeouts were at previously. The barrier between the timeouts and the missings is this relatively tight range: http://trac.webkit.org/log/?rev=99829&stop_rev=99822&verbose=on
> The barrier between the timeouts and the missings is this relatively tight range: > http://trac.webkit.org/log/?rev=99829&stop_rev=99822&verbose=on https://bugs.webkit.org/show_bug.cgi?id=71859 is the only thing in that range that could be relevant (although I don't immediately see how).
I think the TIMEOUT thing is a red herring. We *know* that the tests are being run as pixel tests and that the test code is getting executed before the js-test-pre script has successfully loaded. Whether there's a timing issue or that the js-test-pre script isn't loading at all for some reason is the question.
Tony points out http://test-results.appspot.com/dashboards/flakiness_dashboard.html#tests=http%2Ftests%2Fmultipart%2Fpolicy-ignore-crash.php%2Chttp%2Ftests%2Fmultipart%2Fmultipart-wait-before-boundary.html. http/tests/multipart/multipart-wait-before-boundary.html runs right before http/tests/multipart/policy-ignore-crash.php. And http/tests/multipart/policy-ignore-crash.php only passes when http/tests/multipart/multipart-wait-before-boundary.html times out!
(In reply to comment #8) > Tony points out http://test-results.appspot.com/dashboards/flakiness_dashboard.html#tests=http%2Ftests%2Fmultipart%2Fpolicy-ignore-crash.php%2Chttp%2Ftests%2Fmultipart%2Fmultipart-wait-before-boundary.html. > > http/tests/multipart/multipart-wait-before-boundary.html runs right before http/tests/multipart/policy-ignore-crash.php. And http/tests/multipart/policy-ignore-crash.php only passes when http/tests/multipart/multipart-wait-before-boundary.html times out! Good observation. Along the same lines, I notice that ime-candidate-window-position.html goes missing when it runs immediately after mhtml/simple_page_unmht.mht and passes when it runs after something else. Based on what looks to me to be the regression range and the mhtml connection, I'm going to try rolling out http://trac.webkit.org/changeset/99826 . If a "missing" happens after that, I'll roll it forward again right away.
Looks like after the rollout the flaky timeouts of before are back and the missings are gone. Closing this bug.
Looks like this still happens on fast/forms/form-associated-element-crash3.html and it isn't strongly tied to which test runs immediately before it (usually is fast/forms/form-associated-element-crash2.html). http://test-results.appspot.com/dashboards/flakiness_dashboard.html#tests=fast%2Fforms%2Fform-associated-element-crash3.html It seems to have been happening there for a long time and it looks independent of the problem that was on ime-candidate-window-position.
So what is the conclusion. Was my CL responsible?
(In reply to comment #12) > So what is the conclusion. > Was my CL responsible? Rolling out that patch fixed this issue for ime-candidate-window-position.html. You can see the "missings" are all exactly correlated with landing and rolling out the patch here: http://test-results.appspot.com/dashboards/flakiness_dashboard.html#tests=platform%2Fchromium-cg-mac%2Fediting%2Finput%2Fime-candidate-window-position.html This is still occurring on form-associated-element-crash3.html, but it occurred there prior to your patch. http://test-results.appspot.com/dashboards/flakiness_dashboard.html#tests=fast%2Fforms%2Fform-associated-element-crash3.html I suspect there is a different underlying issue there, but this needs investigation.
Created attachment 114919 [details] Patch
Committed r100128: <http://trac.webkit.org/changeset/100128>
Created attachment 114953 [details] Patch
Committed r100147: <http://trac.webkit.org/changeset/100147>
Same symptoms on fast/forms/form-associated-element-crash.html and fast/forms/form-attribute-elements-order2.html. http://test-results.appspot.com/dashboards/flakiness_dashboard.html#tests=fast%2Fforms%2Fform-associated-element-crash.html%2Cfast%2Fforms%2Fform-attribute-elements-order2.html First occurrence there was before r99864.
Created attachment 114958 [details] Patch
Committed r100151: <http://trac.webkit.org/changeset/100151>
Another one: http://test-results.appspot.com/dashboards/flakiness_dashboard.html#tests=http%2Ftests%2Fmultipart%2Fpolicy-ignore-crash.php
I think I have a good idea what is causing this. My test refactoring removes the <link rel=stylesheet> from the html files. Instead it injects it using js. However, when the link is inserted using js it might not be ready when the first layout happens. The css file has some white-space: pre so that would explain why we are seeing missing newlines and other white spaces.
3 more with the same symptoms: fast/forms/form-attribute-elements.html fast/forms/form-attribute-nonexistence-form-id.html fast/forms/form-attribute-elements-order.html
Created attachment 115158 [details] Patch
Committed r100278: <http://trac.webkit.org/changeset/100278>
I see this again. http://test-results.appspot.com/dashboards/flakiness_dashboard.html#group=%40DEPS%20-%20chromium.org&tests=fast%2Fforms%2Fform-associated-element-removal.html fast/forms/form-associated-element-removal.html loads js-test-pre.js (so requests dumpAsText), but it emits render tree and png, so failed as MISSING.
*** Bug 87776 has been marked as a duplicate of this bug. ***
An even simpler example of a test that hits this error: http://test-results.appspot.com/dashboards/flakiness_dashboard.html#showExpectations=true&tests=http%2Ftests%2Floading%2Fempty-subframe.html The test page's contents are: <script> if (window.layoutTestController) layoutTestController.dumpAsText(); </script> This is a test of load callbacks. It is only useful inside the regression test tool.<br> <iframe name="f1"></iframe> On the chromium mac bots it sometimes runs that test as a pixel test. Up to now, I've assumed the bug was the dumpAsText isn't getting called. I wonder if it's actually that we're overwriting the state in DRT somehow (e.g. when running tests in parallel).
(In reply to comment #28) > An even simpler example of a test that hits this error: > http://test-results.appspot.com/dashboards/flakiness_dashboard.html#showExpectations=true&tests=http%2Ftests%2Floading%2Fempty-subframe.html > > The test page's contents are: > <script> > if (window.layoutTestController) > layoutTestController.dumpAsText(); > </script> > This is a test of load callbacks. It is only useful inside the regression test tool.<br> > <iframe name="f1"></iframe> > > On the chromium mac bots it sometimes runs that test as a pixel test. > > Up to now, I've assumed the bug was the dumpAsText isn't getting called. I wonder if it's actually that we're overwriting the state in DRT somehow (e.g. when running tests in parallel). Interesting. It seems like it would be hard for that not to be called. I wonder if we see this issue across all the ports? I suppose on non-pixel-testing ports we'd end up seeing this as a flaky text diff instead of a MISSING.
(In reply to comment #29) > (In reply to comment #28) > > An even simpler example of a test that hits this error: > > http://test-results.appspot.com/dashboards/flakiness_dashboard.html#showExpectations=true&tests=http%2Ftests%2Floading%2Fempty-subframe.html > > > > The test page's contents are: > > <script> > > if (window.layoutTestController) > > layoutTestController.dumpAsText(); > > </script> > > This is a test of load callbacks. It is only useful inside the regression test tool.<br> > > <iframe name="f1"></iframe> > > > > On the chromium mac bots it sometimes runs that test as a pixel test. > > > > Up to now, I've assumed the bug was the dumpAsText isn't getting called. I wonder if it's actually that we're overwriting the state in DRT somehow (e.g. when running tests in parallel). > > Interesting. It seems like it would be hard for that not to be called. > > I wonder if we see this issue across all the ports? I suppose on non-pixel-testing ports we'd end up seeing this as a flaky text diff instead of a MISSING. This also makes me think that one thing we could do would be, if we were to get a missing pixel result, go back and look at the text we got and see if it looks like we got an actual render tree but the expected result wasn't a render tree, and treat that as a TEXT+IMAGE instead.
*** Bug 94532 has been marked as a duplicate of this bug. ***
This is now happening on XP as well: http://test-results.appspot.com/dashboards/flakiness_dashboard.html#tests=fast%2Fforms%2Fformaction-attribute.html I'm going to make the test expectation general, assuming it'll start happening on other platforms as well.
(In reply to comment #32) > This is now happening on XP as well: > http://test-results.appspot.com/dashboards/flakiness_dashboard.html#tests=fast%2Fforms%2Fformaction-attribute.html > > I'm going to make the test expectation general, assuming it'll start happening on other platforms as well. * for fast/forms/formaction-attribute.html only.
I don't think we've seen these failures since 2012.