Bug 89102 - document.write of scripts that also document.write sometimes writes async
Summary: document.write of scripts that also document.write sometimes writes async
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: DOM (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Eric Seidel (no email)
URL: http://tipsmc.com/index/application_f...
Keywords: HasReduction
Depends on: 90845
Blocks:
  Show dependency treegraph
 
Reported: 2012-06-14 07:54 PDT by Maxim
Modified: 2012-07-10 00:27 PDT (History)
8 users (show)

See Also:


Attachments
Script executed with document.write (554 bytes, application/x-javascript)
2012-06-21 14:56 PDT, Elliott Sprehn
no flags Details
Reduction (357 bytes, text/html)
2012-06-21 15:05 PDT, Elliott Sprehn
no flags Details
Zipped Reduction (1.24 KB, application/zip)
2012-06-21 15:11 PDT, Elliott Sprehn
no flags Details
not quite ready for review (20.12 KB, patch)
2012-07-08 23:49 PDT, Eric Seidel (no email)
no flags Details | Formatted Diff | Diff
Patch (25.71 KB, patch)
2012-07-09 13:44 PDT, Eric Seidel (no email)
no flags Details | Formatted Diff | Diff
Patch for landing (26.16 KB, patch)
2012-07-09 14:48 PDT, Eric Seidel (no email)
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Maxim 2012-06-14 07:54:33 PDT
steps:
1. load the page
2. press on link "Анкета абитуриента" at left column in block "ЗАЯВКА НА ОБУЧЕНИЕ"

What do I expect to happen?
Page reloading

What actually happens?
Page reloading without iframe. After reloading, iframe is empty.
Iframe generated by script: 'document.write(...)'.
After that same script trying to get access to new iframe by ID, but can't, because in this type of reloading Safari create iframe _after_ script finished work :(

tested on Safari 5.1.7 and Night Build 7534.57.2
Comment 1 Alexey Proskuryakov 2012-06-14 09:35:22 PDT
I cannot reproduce this as described. I do see that main content is sometimes missing, but given the bug description, it's likely a mistake in site code.

Can you make a reduced test case for this?
Comment 2 Maxim 2012-06-15 02:20:13 PDT
(In reply to comment #1)
> I cannot reproduce this as described.

Author of the site added "onclick" workaround.

But you can use another method:
1. load the page
2. put cursor in address field and press "enter"

second load will be "smart".
Comment 3 Alexey Proskuryakov 2012-06-15 10:39:01 PDT
We need to make a reduced test case to investigate this. Unfortunately, I cannot work on one any time soon.
Comment 4 Elliott Sprehn 2012-06-21 14:56:40 PDT
Created attachment 148892 [details]
Script executed with document.write
Comment 5 Elliott Sprehn 2012-06-21 15:05:46 PDT
Created attachment 148896 [details]
Reduction

Reduction that shows what's needed to see this bug. The problem is when a script is inserted into the document with document.write and then it also does a document.write of new elements and tries to access them immediately sometimes the writes haven't happened yet.

given:

- index.html
  document.write('<script src=example.js></script>");

- example.js
   document.write('<div></div>');
   var div = document.querySelector('div');
   document.body.appendChild(document.createTextNode(div ? 'Exists', 'Missing');

When doing a full page refresh with Apple+R the div is always defined because the document.write happened immediately. Sometimes, but not always, when you put your cursor in the url bar and hit enter (cached refresh) the document.write happens _after_ the example.js is complete yielding two different DOMs:

<script></script>
<div></div>
Exists

OR

<script></script>
Missing
<div></div>
Comment 6 Elliott Sprehn 2012-06-21 15:11:34 PDT
Created attachment 148897 [details]
Zipped Reduction

This bug is a race condition and it seems that the webkit.org server is slow enough that it never reproduces (ex. by just clicking the Reduction link in attachments). This contains the reduction zipped together so you can load the reduction.html locally and then cached load it repeatedly to see the Failure.

I'm not able to reproduce this when loading from the webkit.org server, but it happens quite frequently when loaded locally.
Comment 7 Adam Barth 2012-06-22 11:24:09 PDT
Is the use of document.querySelector essential?  What happens if you just craw the firstChild, nextSibling links in the DOM?
Comment 8 Elliott Sprehn 2012-06-22 11:26:50 PDT
(In reply to comment #7)
> Is the use of document.querySelector essential?  What happens if you just craw the firstChild, nextSibling links in the DOM?

No that was just a quick way to write the test. The document.write hasn't actually happened yet at that point so document.documentElement.innerHTML or firstChild etc. would show that the div is missing.
Comment 9 Adam Barth 2012-06-22 11:30:32 PDT
Very interesting.  We should look at this in the debugger to see what's going on.

If you're having trouble triggering the race, you might try a data URL because that will load much faster.

I wonder if the problem is arising only when the script is in the MemoryCache.  In that case, we'll get the script back synchronously and might get confused.
Comment 10 Elliott Sprehn 2012-06-22 18:36:02 PDT
This race condition has been really difficult to debug. I've traced it down to the parser being paused, so HTMLContructionSite queues the node creation which then happens later.

This seems to be the result of runScriptsForPausedTreeBuilder() which called HTMLScriptRunner->execute which then returned false. We then do setPaused(true) in HTMLDocumentParser::canTakeNextToken.

Super confusingly if I set breakpoints at ::execute, setPaused and ::canTakenNextToken I can get my above example to print "FailureFailure" (even if I step through the breakpoints very fast) so I don't quite understand how setTimeout() in JS and the scheduling code in the parser interacts.

Adam can you take a look?
Comment 11 Adam Barth 2012-06-22 19:06:54 PDT
> Adam can you take a look?

Yes, I'll take a look.  I appreciate the work you've put into reducing and studying this issue.  I can't promise when I'll have time to look at it, but it does sound like an interesting bug.
Comment 12 Eric Seidel (no email) 2012-06-25 10:22:41 PDT
Adam and I looked at this for a while this morning.  We don't have an exact answer for you yet, but this comes from whether the script is in the cache or not.

We've learned that the inner script is always running after the outer script complets, but for some reason the inner document.write is sometimes not executing immediately (presumably getting queued in the tokenizer).  Unclear exactly what branch of code this is yet, as we haven't pulled this up in a debugger.
Comment 13 Eric Seidel (no email) 2012-07-02 18:04:09 PDT
OK.  Adam and I spent the last hour staring at this in the debugger, and I believe we understand what's going on.

As mentioned before, the root difference in behavior comes from when the script is already in the MemoryCache, so the load completes immediately ("notifyFinished" is called on the client from within the load function.)

The outer (inline) script runs, through HTMLScriptRunner::execute, which calls HTMLScriptRunner::runScript, which in turn calls ScriptElement::prepareScript, which because the script is inline (has no src attribute), actually runs the script.

As part of that outer (inline) script running, it document.write's a <script> tag to the (external) script, which pumps the parser, and then makes a nested call to HTMLScriptRunner::runScript:
http://trac.webkit.org/browser/trunk/Source/WebCore/html/parser/HTMLScriptRunner.cpp#L276
It is not inline, so it is not executed as part of ScriptElement::prepareScript, but rather made the m_parserBlockingScript by requestParsingBlockingScript at the end of the nested runScript call.

However, when the outer (inline) script's runScript completes, and unrolls to the outer HTMLScriptRunner::execute, this is where we get in trouble.  We now have parser blocking script:
http://trac.webkit.org/browser/trunk/Source/WebCore/html/parser/HTMLScriptRunner.cpp#L172
and so we try to execute it.  However, this branch does *not* unpause the treebuilder, as would be the case were we to wait for the script to complete from the network:
http://trac.webkit.org/browser/trunk/Source/WebCore/html/parser/HTMLDocumentParser.cpp#L500
(note the setPause(false) there.)

This is a (my) design error, as part of making the HTMLScriptRunner a "leaf" in this object graph.  Ideally the HTMLScriptRunner would know nothing about the HTMLTreeBuilder and the HTMLTreeBuilder would know nothing about the HTMLScriptRunner.  All communication/coordination is done through the HTMLDocumentParser who is actually in charge.

So one theory that Adam and I have come up with for fixing this, is to remove this isPaused bool on the HTMLTreeBuilder (as it is really just a proxy for HTMLTreeBuilder::m_scriptToProcess), and instead add a HTMLDocumentParser::shouldContinueParsing() function which knows how to check both HTMLTreeBuilder::m_scriptToProcess and HTMLScriptRunner::m_parseBlockingScript.  This would make it so that when the document parser passes the script from the tree builder to the script runner, the treebuilder would automatically "un-pause" (in the old parlance), but that parsing would not continue until the script runner no longer thought it had a parsing blocking script (which is cleared right as we start executing a script).

During this investigation, we also discovered some oddness for handling the cases of inline scripts which are blocked on stylesheet loads.  There may be some bugs similar to this one in that handling:
http://trac.webkit.org/browser/trunk/Source/WebCore/html/parser/HTMLScriptRunner.cpp#L303
Comment 14 Eric Seidel (no email) 2012-07-08 23:49:16 PDT
Created attachment 151193 [details]
not quite ready for review
Comment 15 Eric Seidel (no email) 2012-07-08 23:49:34 PDT
I'll write a nice long ChangeLog when I'm less tired.
Comment 16 Adam Barth 2012-07-09 08:48:43 PDT
Looks like a solid approach.  I'm glad you were able to reduce the total amount of state in the machine.
Comment 17 Tony Gentilcore 2012-07-09 09:10:46 PDT
(In reply to comment #16)
> Looks like a solid approach.  I'm glad you were able to reduce the total amount of state in the machine.

lgtm too
Comment 18 Eric Seidel (no email) 2012-07-09 13:44:52 PDT
Created attachment 151309 [details]
Patch
Comment 19 Adam Barth 2012-07-09 14:34:12 PDT
Comment on attachment 151309 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=151309&action=review

> Source/WebCore/html/parser/HTMLDocumentParser.cpp:468
> +    return m_treeBuilder->hasParserBlockingScript() || (m_scriptRunner && m_scriptRunner->hasParserBlockingScript());

Should we ASSERT that the && of these conditions never holds?
Comment 20 Eric Seidel (no email) 2012-07-09 14:48:35 PDT
Created attachment 151323 [details]
Patch for landing
Comment 21 WebKit Review Bot 2012-07-09 16:48:33 PDT
Comment on attachment 151323 [details]
Patch for landing

Clearing flags on attachment: 151323

Committed r122168: <http://trac.webkit.org/changeset/122168>
Comment 22 WebKit Review Bot 2012-07-09 16:48:39 PDT
All reviewed patches have been landed.  Closing bug.
Comment 23 Eric Seidel (no email) 2012-07-10 00:27:11 PDT
This appears to have caused test flakiness.  I'll look tomorrow, but I may have to roll this out until I can figure out why.