Bug 130942

Summary: REGRESSION (r166853): fast/preloader/document-write.html is very flaky
Product: WebKit Reporter: Alexey Proskuryakov <ap>
Component: DOMAssignee: Darin Adler <darin>
Status: RESOLVED FIXED    
Severity: Normal CC: beidson, commit-queue, darin, koivisto, slewis
Priority: P2 Keywords: Regression
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Attachments:
Description Flags
(huge) command to reproduce locally
none
Patch
none
Patch andersca: review+

Alexey Proskuryakov
Reported 2014-03-30 01:37:11 PDT
fast/preloader/document-write.html started to fail very very frequently last week, I suspect <http://trac.webkit.org/changeset/166144>. The diff seems to always be the same: @@ -1,4 +1,3 @@ -document-write-plaintext.js has MIME type text/javascript script1.js has MIME type text/javascript This test requires DumpRenderTree to see the log of what resources are loaded.
Attachments
(huge) command to reproduce locally (102.68 KB, text/plain)
2014-05-06 22:16 PDT, Alexey Proskuryakov
no flags
Patch (11.68 KB, patch)
2014-05-09 10:41 PDT, Darin Adler
no flags
Patch (3.57 KB, patch)
2014-05-10 07:59 PDT, Darin Adler
andersca: review+
Darin Adler
Comment 1 2014-03-31 09:45:25 PDT
Seems likely the test has a race condition in it. Let me take a look at how it works.
Darin Adler
Comment 2 2014-03-31 09:51:41 PDT
Is the evidence pointing to r166144 strong? I ask because after reading and re-reading that patch it seems highly unlikely that change would affect a test that does not involve <object> or <embed> elements. I think that rolling out that change because of this test failure would be a big mistake. The failure does not reflect what this test is trying to test. The point of this test is whether the preload scanner kicks off the load of script1.js or not, and clearly that is still happening. It’s just that the test is fragile for some reason. Antti, do you have any idea what could be making this test sometimes not log the load of document-write-plaintext.js?
Alexey Proskuryakov
Comment 3 2014-03-31 10:07:33 PDT
This test now fails about 50% of the time on some of the bots, and the earliest detected failure was on r166144:r166145. Preceding changes seem much less relevant, so yes, I would say that the evidence is fairly strong. Complete history: http://webkit-test-results.appspot.com/dashboards/flakiness_dashboard.html#showAllRuns=true&tests=fast%2Fpreloader%2Fdocument-write.html If we can't roll out this change, and if cannot fix it, then we should definitely disable or remove the test. It's pure noise now, only confirming that it doesn't crash.
Darin Adler
Comment 4 2014-03-31 10:18:50 PDT
(In reply to comment #3) > If we can't roll out this change We can, it’s just not necessarily worth it. > if cannot fix it No clue on the cause of the different behavior yet. > we should definitely disable or remove the test. Maybe. > It's pure noise now, only confirming that it doesn't crash. Well, the fact that script1.js appears in the output meaningfully indicates that what the test was designed to test is still working; preloading is occurring. It’s unfortunate that the test machinery classifies that as a failure. Whether document-write-plaintext.js shows up in the output or not is not relevant to what the test is trying to test. So the test would still have some value if we could find a way to ignore the irrelevant part. And also it would be interesting to know why the load of document-write-plaintext.js does not always log. Maybe there’s some previous test that loads it and it is cached?
Alexey Proskuryakov
Comment 5 2014-03-31 17:06:34 PDT
document-write-plaintext.js is also used in fast/preloader/document-write-2.html (which comes before fast/preloader/document-write.html, counterintuitively). I tried to reproduce this issue locally to see if making the resource name unique would resolve it, but I couldn't make it happen. So, I just marked the test as flaky for now in <http://trac.webkit.org/r166548>, to avoid increasing confusion. Also, I suspect that the test is inherently flaky, because preloading could happen before JS execution starts, and thus before we call testRunner.dumpResourceResponseMIMETypes(). Assigning to Darin per his request.
Antti Koivisto
Comment 6 2014-04-01 03:06:29 PDT
(In reply to comment #5) > document-write-plaintext.js is also used in fast/preloader/document-write-2.html (which comes before fast/preloader/document-write.html, counterintuitively). > > I tried to reproduce this issue locally to see if making the resource name unique would resolve it, but I couldn't make it happen. It might be good to just land the change. That sound like a very potential fix. > Also, I suspect that the test is inherently flaky, because preloading could happen before JS execution starts, and thus before we call testRunner.dumpResourceResponseMIMETypes(). The script tags for the test resources are themselves inserted by document.write. Preload scanner can't see them before the script runs, in strict document order. It would be good to come up with a more direct mechanism for testing preload scanner and resource loads in general.
Stephanie Lewis
Comment 7 2014-04-02 16:58:30 PDT
Due to PLT issues 166144 was rolled out in http://trac.webkit.org/changeset/166680
Alexey Proskuryakov
Comment 8 2014-04-02 23:22:32 PDT
Looks good after the rollout. Removed expectation in <http://trac.webkit.org/r166699>.
Alexey Proskuryakov
Comment 9 2014-04-09 09:48:08 PDT
The offending code got re-landed in <http://trac.webkit.org/r166853>, and the test is flaky again. Added a test expectation in <http://trac.webkit.org/r167015>.
Alexey Proskuryakov
Comment 10 2014-05-01 09:32:43 PDT
What is the next step on this? The change was intended to be a refactoring with no side effects, but it broke a test. We have more object-related regressions in the tree, from other changes. See also bug 132297, and <http://webkit-test-results.appspot.com/dashboards/flakiness_dashboard.html#showAllRuns=true&tests=plugins%2Fembed-attributes-style.html> for which there is no bug at the moment.
Darin Adler
Comment 11 2014-05-05 20:14:43 PDT
(In reply to comment #10) > What is the next step on this? The change was intended to be a refactoring with no side effects, but it broke a test. The next step is that I will fix this failure when I have a chance. > We have more object-related regressions in the tree, from other changes. See also bug 132297 Is that object-related?
Alexey Proskuryakov
Comment 12 2014-05-05 22:41:21 PDT
> Is that object-related? Yes, Tim's analysis in that bug appears to imply so. The test content is all in an <object>, and layout state is not properly synchronized with main document. This is probably best to discuss in bug 132297. It's unfortunate that we have the additional flakiness from r166853, but I have no evidence that it's directly related to that bug.
Darin Adler
Comment 13 2014-05-06 20:48:42 PDT
Any tips on how to reproduce the failure locally? I am running the test over and over again without seeing any failures.
Alexey Proskuryakov
Comment 14 2014-05-06 22:16:04 PDT
Created attachment 230971 [details] (huge) command to reproduce locally This is quite hard to reproduce locally indeed. The only way I could reproduce was by replicating the exact sequence of tests as run by a single instance of DumpRenderTree on a bot. This can undoubtedly be simplified through bisection, but running only the last few tests is insufficient. The attached command does two iterations of that. On first iteration, fast/preloader/document-write.html fails. On second iteration, fast/preloader/document-write-2.html and fast/preloader/document-write.html both fail.
Alexey Proskuryakov
Comment 15 2014-05-06 22:22:53 PDT
I'm bisecting now.
Alexey Proskuryakov
Comment 16 2014-05-06 22:32:11 PDT
Here is the minimal sequence to reproduce: run-webkit-tests --child-processes 1 --no-retry-failures fast/dom/clone-node-load-event-crash.html fast/preloader/document-write-2.html fast/preloader/document-write.html This fails too: run-webkit-tests --child-processes 1 --no-retry-failures fast/dom/clone-node-load-event-crash.html fast/preloader/document-write-2.html fast/preloader/document-write-2.html
Alexey Proskuryakov
Comment 17 2014-05-06 22:55:04 PDT
An additional step, possibly obvious: comment out the entry for fast/preloader/document-write.html in platform/mac/TestExpectations, or run-webkit-tests will say "All 3 tests ran as expected."
Darin Adler
Comment 18 2014-05-07 18:03:41 PDT
Thanks. I had edited TestExpectations, but this recipe will be very helpful.
Darin Adler
Comment 19 2014-05-09 10:33:03 PDT
The problem here is simply cached resources from previous tests. Something needs to clear these caches for tests that rely on resources not being cached. I have started tackling this and I have a first cut patch.
Darin Adler
Comment 20 2014-05-09 10:41:25 PDT
Darin Adler
Comment 21 2014-05-09 10:45:44 PDT
I attached my first cut at a patch. There are a few things I am uncertain about: 1) Should emptying the caches be something the tests do explicitly separately instead of as a side effect of the “serialize loads” test function? 2) Is it valuable to empty the disk caches too? The real issue here is with in-memory caches. 3) Is there some additional consideration with the network process? Probably not, because what is actively causing a problem here is in-web-process memory caching. 4) Does the problem I fixed in +[WebCache empty] with the “just change the caching model twice” technique for emptying caches reflect a bug that affects developers using public WebKit API? Should we fix that?
Alexey Proskuryakov
Comment 22 2014-05-09 10:53:19 PDT
I'm super curious why fast/dom/clone-node-load-event-crash.html is necessary to reproduce this.
Darin Adler
Comment 23 2014-05-09 11:08:33 PDT
(In reply to comment #22) > I'm super curious why fast/dom/clone-node-load-event-crash.html is necessary to reproduce this. Sadly, I’m out of time to work on this today, so I won't have a chance to answer that question today.
Darin Adler
Comment 24 2014-05-10 07:51:24 PDT
Comment on attachment 231166 [details] Patch This is a workaround, not the real fix.
Darin Adler
Comment 25 2014-05-10 07:59:35 PDT
WebKit Commit Bot
Comment 26 2014-05-10 08:02:16 PDT
Attachment 231226 [details] did not pass style-queue: ERROR: Source/WebCore/style/StyleResolveTree.cpp:992: Missing space before { [whitespace/braces] [5] Total errors found: 1 in 3 files If any of these errors are false positives, please file a bug against check-webkit-style.
Darin Adler
Comment 27 2014-05-10 10:52:21 PDT
Comment on attachment 231226 [details] Patch This is the real fix.
Darin Adler
Comment 28 2014-05-10 12:51:01 PDT
Note You need to log in before you can comment on or make changes to this bug.