Bug 130942

Summary: REGRESSION (r166853): fast/preloader/document-write.html is very flaky
Product: WebKit Reporter: Alexey Proskuryakov <ap>
Component: DOMAssignee: Darin Adler <darin>
Status: RESOLVED FIXED    
Severity: Normal CC: beidson, commit-queue, darin, koivisto, slewis
Priority: P2 Keywords: Regression
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Attachments:
Description Flags
(huge) command to reproduce locally
none
Patch
none
Patch andersca: review+

Description Alexey Proskuryakov 2014-03-30 01:37:11 PDT
fast/preloader/document-write.html started to fail very very frequently last week, I suspect <http://trac.webkit.org/changeset/166144>. The diff seems to always be the same:

@@ -1,4 +1,3 @@
-document-write-plaintext.js has MIME type text/javascript
 script1.js has MIME type text/javascript
 
 This test requires DumpRenderTree to see the log of what resources are loaded.
Comment 1 Darin Adler 2014-03-31 09:45:25 PDT
Seems likely the test has a race condition in it. Let me take a look at how it works.
Comment 2 Darin Adler 2014-03-31 09:51:41 PDT
Is the evidence pointing to r166144 strong? I ask because after reading and re-reading that patch it seems highly unlikely that change would affect a test that does not involve <object> or <embed> elements.

I think that rolling out that change because of this test failure would be a big mistake. The failure does not reflect what this test is trying to test. The point of this test is whether the preload scanner kicks off the load of script1.js or not, and clearly that is still happening. It’s just that the test is fragile for some reason.

Antti, do you have any idea what could be making this test sometimes not log the load of document-write-plaintext.js?
Comment 3 Alexey Proskuryakov 2014-03-31 10:07:33 PDT
This test now fails about 50% of the time on some of the bots, and the earliest detected failure was on r166144:r166145. Preceding changes seem much less relevant, so yes, I would say that the evidence is fairly strong.

Complete history: http://webkit-test-results.appspot.com/dashboards/flakiness_dashboard.html#showAllRuns=true&tests=fast%2Fpreloader%2Fdocument-write.html

If we can't roll out this change, and if cannot fix it, then we should definitely disable or remove the test. It's pure noise now, only confirming that it doesn't crash.
Comment 4 Darin Adler 2014-03-31 10:18:50 PDT
(In reply to comment #3)
> If we can't roll out this change

We can, it’s just not necessarily worth it.

> if cannot fix it

No clue on the cause of the different behavior yet.

> we should definitely disable or remove the test.

Maybe.

> It's pure noise now, only confirming that it doesn't crash.

Well, the fact that script1.js appears in the output meaningfully indicates that what the test was designed to test is still working; preloading is occurring.

It’s unfortunate that the test machinery classifies that as a failure. Whether document-write-plaintext.js shows up in the output or not is not relevant to what the test is trying to test.

So the test would still have some value if we could find a way to ignore the irrelevant part.

And also it would be interesting to know why the load of document-write-plaintext.js does not always log. Maybe there’s some previous test that loads it and it is cached?
Comment 5 Alexey Proskuryakov 2014-03-31 17:06:34 PDT
document-write-plaintext.js is also used in fast/preloader/document-write-2.html (which comes before fast/preloader/document-write.html, counterintuitively).

I tried to reproduce this issue locally to see if making the resource name unique would resolve it, but I couldn't make it happen. So, I just marked the test as flaky for now in <http://trac.webkit.org/r166548>, to avoid increasing confusion.

Also, I suspect that the test is inherently flaky, because preloading could happen before JS execution starts, and thus before we call testRunner.dumpResourceResponseMIMETypes().

Assigning to Darin per his request.
Comment 6 Antti Koivisto 2014-04-01 03:06:29 PDT
(In reply to comment #5)
> document-write-plaintext.js is also used in fast/preloader/document-write-2.html (which comes before fast/preloader/document-write.html, counterintuitively).
> 
> I tried to reproduce this issue locally to see if making the resource name unique would resolve it, but I couldn't make it happen.

It might be good to just land the change. That sound like a very potential fix.

> Also, I suspect that the test is inherently flaky, because preloading could happen before JS execution starts, and thus before we call testRunner.dumpResourceResponseMIMETypes().

The script tags for the test resources are themselves inserted by document.write. Preload scanner can't see them before the script runs, in strict document order.

It would be good to come up with a more direct mechanism for testing preload scanner and resource loads in general.
Comment 7 Stephanie Lewis 2014-04-02 16:58:30 PDT
Due to PLT issues 166144 was rolled out in http://trac.webkit.org/changeset/166680
Comment 8 Alexey Proskuryakov 2014-04-02 23:22:32 PDT
Looks good after the rollout. Removed expectation in <http://trac.webkit.org/r166699>.
Comment 9 Alexey Proskuryakov 2014-04-09 09:48:08 PDT
The offending code got re-landed in <http://trac.webkit.org/r166853>, and the test is flaky again.

Added a test expectation in <http://trac.webkit.org/r167015>.
Comment 10 Alexey Proskuryakov 2014-05-01 09:32:43 PDT
What is the next step on this? The change was intended to be a refactoring with no side effects, but it broke a test.

We have more object-related regressions in the tree, from other changes. See also bug 132297, and <http://webkit-test-results.appspot.com/dashboards/flakiness_dashboard.html#showAllRuns=true&tests=plugins%2Fembed-attributes-style.html> for which there is no bug at the moment.
Comment 11 Darin Adler 2014-05-05 20:14:43 PDT
(In reply to comment #10)
> What is the next step on this? The change was intended to be a refactoring with no side effects, but it broke a test.

The next step is that I will fix this failure when I have a chance.

> We have more object-related regressions in the tree, from other changes. See also bug 132297

Is that object-related?
Comment 12 Alexey Proskuryakov 2014-05-05 22:41:21 PDT
> Is that object-related?

Yes, Tim's analysis in that bug appears to imply so. The test content is all in an <object>, and layout state is not properly synchronized with main document.

This is probably best to discuss in bug 132297. It's unfortunate that we have the additional flakiness from r166853, but I have no evidence that it's directly related to that bug.
Comment 13 Darin Adler 2014-05-06 20:48:42 PDT
Any tips on how to reproduce the failure locally? I am running the test over and over again without seeing any failures.
Comment 14 Alexey Proskuryakov 2014-05-06 22:16:04 PDT
Created attachment 230971 [details]
(huge) command to reproduce locally

This is quite hard to reproduce locally indeed. The only way I could reproduce was by replicating the exact sequence of tests as run by a single instance of DumpRenderTree on a bot. This can undoubtedly be simplified through bisection, but running only the last few tests is insufficient.

The attached command does two iterations of that. On first iteration, fast/preloader/document-write.html fails. On second iteration, fast/preloader/document-write-2.html and fast/preloader/document-write.html both fail.
Comment 15 Alexey Proskuryakov 2014-05-06 22:22:53 PDT
I'm bisecting now.
Comment 16 Alexey Proskuryakov 2014-05-06 22:32:11 PDT
Here is the minimal sequence to reproduce:

run-webkit-tests --child-processes 1 --no-retry-failures fast/dom/clone-node-load-event-crash.html fast/preloader/document-write-2.html  fast/preloader/document-write.html

This fails too:

run-webkit-tests --child-processes 1 --no-retry-failures fast/dom/clone-node-load-event-crash.html fast/preloader/document-write-2.html  fast/preloader/document-write-2.html
Comment 17 Alexey Proskuryakov 2014-05-06 22:55:04 PDT
An additional step, possibly obvious: comment out the entry for fast/preloader/document-write.html in platform/mac/TestExpectations, or run-webkit-tests will say "All 3 tests ran as expected."
Comment 18 Darin Adler 2014-05-07 18:03:41 PDT
Thanks. I had edited TestExpectations, but this recipe will be very helpful.
Comment 19 Darin Adler 2014-05-09 10:33:03 PDT
The problem here is simply cached resources from previous tests. Something needs to clear these caches for tests that rely on resources not being cached. I have started tackling this and I have a first cut patch.
Comment 20 Darin Adler 2014-05-09 10:41:25 PDT
Created attachment 231166 [details]
Patch
Comment 21 Darin Adler 2014-05-09 10:45:44 PDT
I attached my first cut at a patch.

There are a few things I am uncertain about:

1) Should emptying the caches be something the tests do explicitly separately instead of as a side effect of the “serialize loads” test function?

2) Is it valuable to empty the disk caches too? The real issue here is with in-memory caches.

3) Is there some additional consideration with the network process? Probably not, because what is actively causing a problem here is in-web-process memory caching.

4) Does the problem I fixed in +[WebCache empty] with the “just change the caching model twice” technique for emptying caches reflect a bug that affects developers using public WebKit API? Should we fix that?
Comment 22 Alexey Proskuryakov 2014-05-09 10:53:19 PDT
I'm super curious why fast/dom/clone-node-load-event-crash.html is necessary to reproduce this.
Comment 23 Darin Adler 2014-05-09 11:08:33 PDT
(In reply to comment #22)
> I'm super curious why fast/dom/clone-node-load-event-crash.html is necessary to reproduce this.

Sadly, I’m out of time to work on this today, so I won't have a chance to answer that question today.
Comment 24 Darin Adler 2014-05-10 07:51:24 PDT
Comment on attachment 231166 [details]
Patch

This is a workaround, not the real fix.
Comment 25 Darin Adler 2014-05-10 07:59:35 PDT
Created attachment 231226 [details]
Patch
Comment 26 WebKit Commit Bot 2014-05-10 08:02:16 PDT
Attachment 231226 [details] did not pass style-queue:


ERROR: Source/WebCore/style/StyleResolveTree.cpp:992:  Missing space before {  [whitespace/braces] [5]
Total errors found: 1 in 3 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 27 Darin Adler 2014-05-10 10:52:21 PDT
Comment on attachment 231226 [details]
Patch

This is the real fix.
Comment 28 Darin Adler 2014-05-10 12:51:01 PDT
Committed r168580: <http://trac.webkit.org/changeset/168580>