Bug 38146 - document.write is not synchronous after page load
Summary: document.write is not synchronous after page load
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: DOM (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Joseph Pecoraro
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2010-04-26 15:52 PDT by Joseph Pecoraro
Modified: 2010-05-13 17:05 PDT (History)
8 users (show)

See Also:


Attachments
[TEST] Manual Test - Beware this may crash some browsers. (1.28 KB, text/html)
2010-04-26 16:16 PDT, Joseph Pecoraro
no flags Details
[PATCH] Force Document.write to be Synchronous (2.50 KB, patch)
2010-04-26 16:28 PDT, Joseph Pecoraro
mjs: review+
Details | Formatted Diff | Diff
[PATCH] Addressed Comments - Added Test (5.92 KB, patch)
2010-05-03 09:46 PDT, Joseph Pecoraro
no flags Details | Formatted Diff | Diff
[PATCH] Fixed a Few Style Issues (5.92 KB, patch)
2010-05-03 10:01 PDT, Joseph Pecoraro
abarth: review-
Details | Formatted Diff | Diff
[PATCH] Fixed Comments - Added RAII Class (7.11 KB, patch)
2010-05-05 16:19 PDT, Joseph Pecoraro
abarth: review+
abarth: commit-queue-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Joseph Pecoraro 2010-04-26 15:52:40 PDT
HTML5 describes document.write as:
http://www.whatwg.org/specs/web-apps/current-work/multipage/apis-in-html-documents.html#document.write()

Part 5 describes processing the characters one at a time until reaching the insertion point. This is not 100% followed by WebKit. WebKit can break early in the HTMLTokenizer and put the remaining work on a timer. This may result in incorrect results if running DOM operations immediately after a document.write.

This is difficult to test. The only idea I have had is processing a very large string to hit the 500ms limit. This is not future proof, as there could always be a CPU fast enough to complete the write() in less than the time limit. Also, a large enough string may hang the browser.

Is it desired that we follow the spec here?

Other Browsers:

  Firefox 3.6 is synchronous or hangs.
  Chrome 5.0.342.9 is synchronous or hangs.
  Safari 4.0.4 is like WebKit and breaks after a long enough time.

Example of Issue:

  var doc = document.implementation.createHTMLDocument();
  doc.write( sufficientlyLongString );
  doc.getElementsByTagName("*").length; // DOM Operation may have undefined results

WebKit will eventually complete the parsing, but it just fills in the DOM in the background. There is no way to know when it is complete.
Comment 1 Joseph Pecoraro 2010-04-26 16:16:45 PDT
Created attachment 54344 [details]
[TEST] Manual Test - Beware this may crash some browsers.

This test builds a very long string and passes it to document.write, then does some DOM queries immediately after the write call. Sample output of WebKit shows the undesired behavior:

Sample output:

>  49987 nodes immediately after the call (NOTE this number may vary)
>  65539 nodes 100 milliseconds later
>  65539 nodes 1 second later
Comment 2 Joseph Pecoraro 2010-04-26 16:28:48 PDT
Created attachment 54345 [details]
[PATCH] Force Document.write to be Synchronous

Parts 1-4 of the HTML5 spec still apply, and document.write with pending scripts will still break. However, if it gets past that point in HTMLTokenizer::write then the data in document.write will be parsed synchronously.
Comment 3 Joseph Pecoraro 2010-04-28 22:31:59 PDT
Anyone want to comment on this? I'll try asking some people in #webkit tomorrow.
Comment 4 David Kilzer (:ddkilzer) 2010-04-30 10:22:51 PDT
(In reply to comment #3)
> Anyone want to comment on this? I'll try asking some people in #webkit
> tomorrow.

Added Hixie in case he would like to comment.
Comment 5 Joseph Pecoraro 2010-05-01 14:11:35 PDT
It was asked that I do some more testing and impact research:

> I think matching other browsers is cool and a good enough reason
> to do something ... but more testing / compat impact info would be
> really valuable

Here goes:


How this patch affects behavior:
--------------------------------

  1. Typical page load times should be unaffected. This is because
  any document.write which happens in a <script> on page load time
  will increment m_executingScript and cause the document.write to
  be synchronous anyways.
  
  2. Entire pages written with document.write (where it writes over
  the existing page) will be affected. I haven't seen usage of this,
  and it appears to be discouraged at all of the above articles.
  
  3. However, in those cases we will match the spec and other
  browsers. I have tested on Mac OS X 10.6 - Firefox 3.6,
  Chrome 5.0.342.9, Opera 10.53


Test Cases:
-----------

I tested the state of the tokenizer right before it makes its decision to yield or not. I show below how the state of the tokenizer will change, and whether or not that will change behavior (as I mentioned above).

  1. Inline <script> document.write on page load
      
      state.loadingExtScript(): false
      state.forceSynchronous(): false  [will become true, but no change]
      m_executingScript       : true
      allowedYield            : false

  2. External <script src="..."> with document.write on page load

      state.loadingExtScript(): true
      state.forceSynchronous(): false  [will become true, but no change]
      m_executingScript       : true
      allowedYield            : false
      
  3. document.write after page is done loading

      state.loadingExtScript(): false/true
      state.forceSynchronous(): false  [will become true, changes behavior]
      m_executingScript       : true
      allowedYield            : false

Case 3 is what I mention above. Below is some research I did to see how this could affect things. For instance the "Public perception" or "what the web is saying" about document.write:


Top Google Hits:
----------------

- http://javascript.about.com/library/blwrite.htm
Advises use only while the page is loading, because otherwise it writes over the existing page content. Also, document.write doesn't apply to XML and therefore XHTML.


- http://www.sitepoint.com/blogs/2007/07/11/insert-in-place-without-documentwrite/
Advises DOM creation instead of document.write.

>   And there you have it – a simple but elegant way of inserting
>   content in place, removing the last vestige of need for
>   document.write!


- http://java-programming.suite101.com/article.cfm/javascript_documentwrite_tutorial
2008 article describing how to use it and why it is useful. This article does not mention any alternatives. It does warn that usage should be restricted to page loading:

>  The document.write command must be carried out during the loading of
>  the page. So, if it is attached to any event that executes after the
>  page has loaded, then the whole page will be replaced with the
>  contents of the document.write command.

However you can still find recent articles proposing document.write as useful:

- http://www.joshstauffer.com/ways-to-use-document-write/

>  The document.write command is a standard, yet powerful, JavaScript
>  command for writing output to a page. In this article I’ve outline
>  3 ways to use document.write statements.


Common document.write usage across the Web:
-------------------------------------------

- Google Analytics
http://code.google.com/apis/analytics/docs/tracking/gaTrackingOverview.html
The "traditional" snippet used across the web for free analytics uses document.write. However its usage should be unaffected by this patch. Its also worth noting that the newer asynchronous tracking snippet is described as "an improved way", and this does not use document.write.


Summary:
--------

I don't see anyone recommending usage of document.write after page load. Its possible that I just haven't seen it. Its arguably more likely that document.implementation.createHTMLDocument and write() on the newly created document could be used on some sites, but "document.implementation" is not available on all browsers and in those cases we would start matching other browser's behavior and give the results users would expect by matching WebKit's own behavior of document.write on page loads.
Comment 6 Joseph Pecoraro 2010-05-01 14:17:10 PDT
Correction to my above comment. In Test Case 3: m_executingScript should be false. A copy & paste error on my part. This flag was not set for me when I executed a script in window.onload.
Comment 7 Maciej Stachowiak 2010-05-01 17:41:05 PDT
Comment on attachment 54345 [details]
[PATCH] Force Document.write to be Synchronous

Code change looks good, but I have two suggestions for improving this patch:

1) it would probably be good to mention the external script exception to synchronous behavior in the ChangeLog
2) it would be good to add a test case, even if it only  fails part of the time in the bad case, or even if it is a manual test

r- for now for the lack of test. I'll gladly r+ if the points above are addressed.
Comment 8 Eric Seidel (no email) 2010-05-02 19:32:22 PDT
Attachment 54345 [details] was posted by a committer and has review+, assigning to Joseph Pecoraro for commit.
Comment 9 Joseph Pecoraro 2010-05-03 09:46:19 PDT
Created attachment 54934 [details]
[PATCH] Addressed Comments - Added Test

> 1) it would probably be good to mention the external script exception to
> synchronous behavior in the ChangeLog

Sure, I've added a comment to the code as well as the ChangeLog.


> 2) it would be good to add a test case, even if it only  fails part of the time
> in the bad case, or even if it is a manual test

I added a LayoutTest in fast/dom since I'm told "fast" is a misnomer. This isn't really a fast test.
Comment 10 Joseph Pecoraro 2010-05-03 09:47:21 PDT
Updated bug title.
Comment 11 mitz 2010-05-03 09:48:25 PDT
Comment on attachment 54934 [details]
[PATCH] Addressed Comments - Added Test

> +    HTMLTokenizer* htmlTokenizer = (m_tokenizer->isHTMLTokenizer() ? static_cast<HTMLTokenizer*>(m_tokenizer.get()) : 0);

Redundant parentheses.
Comment 12 Joseph Pecoraro 2010-05-03 10:01:52 PDT
Created attachment 54935 [details]
[PATCH] Fixed a Few Style Issues
Comment 13 Joseph Pecoraro 2010-05-04 15:41:13 PDT
Maciej I added a new patch addressing your comments (add test and comments). Just noticed you weren't CC'd.
Comment 14 Adam Barth 2010-05-05 10:15:11 PDT
Comment on attachment 54935 [details]
[PATCH] Fixed a Few Style Issues

WebCore/dom/Document.cpp:1963
 +      HTMLTokenizer* htmlTokenizer = (m_tokenizer->isHTMLTokenizer() ? static_cast<HTMLTokenizer*>(m_tokenizer.get()) : 0);
No outer parenthesis needed.

WebCore/dom/Document.cpp:1963
 +      HTMLTokenizer* htmlTokenizer = (m_tokenizer->isHTMLTokenizer() ? static_cast<HTMLTokenizer*>(m_tokenizer.get()) : 0);
This seems like it should be an asHTMLTokenzer method of m_tokenizer.  Does this pattern occur elsewhere in this file?  RTTI is usually a bad idea.

WebCore/dom/Document.cpp:1972
 +          htmlTokenizer->setForceSynchronous(savedForceSynchronous);
Can we do this with a RAII class?  The implementation you have is error prone.
Comment 15 Joseph Pecoraro 2010-05-05 16:19:08 PDT
Created attachment 55169 [details]
[PATCH] Fixed Comments - Added RAII Class

> No outer parenthesis needed.

Arg, I attached the same patch for patch 3 instead of the one fixing style issues. Thanks.


> This seems like it should be an asHTMLTokenzer method of m_tokenizer.  Does
> this pattern occur elsewhere in this file?  RTTI is usually a bad idea.

I'm not sure how useful this will be. I still need to do the same thing I am doing, checking if it is an HTMLTokenizer and casting it. But I haven't had much experience using this idiom.

Document.cpp uses the following pattern a lot, which is similar:

    if (node->hasLocalName(someTag))
        sheet = static_cast<HTMLSomeElement*>(n)->...

There is one other use of `isHTMLTokenizer()`, in HTMLFormControlElement.cpp.


> WebCore/dom/Document.cpp:1972
>  +          htmlTokenizer->setForceSynchronous(savedForceSynchronous);
> Can we do this with a RAII class?  The implementation you have is error prone.

Added this in the recent patch. I'd appreciate feedback on the name to make it clearer what this is doing.

I'm still worried about this test being too long. It takes ~7 seconds on my machine. If anyone else thinks that is too long I'll gladly make this a manual test.
Comment 16 Joseph Pecoraro 2010-05-05 16:22:57 PDT
> ~7 seconds on my machine

I should clarify that this is In a debug build. It should be much faster in a Release build.
Comment 17 Adam Barth 2010-05-06 16:50:47 PDT
Comment on attachment 55169 [details]
[PATCH] Fixed Comments - Added RAII Class

Looks good.  A 7 second test is somewhat worrying.  Please address the comments below before landing.

LayoutTests/fast/dom/script-tests/document-write-synchronous.js:3
 +  var immediateElementCount, delayedElementCount;
I'm not sure this does what you think it does.

WebCore/dom/Document.cpp:217
 +          m_htmlTokenizer = asHTMLTokenizer(tokenizer);
Oh, I meant to add asHTMLTokenizer as a virtual function on Tokenizer to avoid the static_cast.

WebCore/html/HTMLTokenizer.h:436
 +      ASSERT(tokenizer->isHTMLTokenizer());
Because write only works on HTML documents?  I think it would be better to make this a virtual function and get rid of the static cast.

WebCore/dom/Document.cpp:1984
 +      SynchronousHTMLTokenizerGuard tokenizerGuard(m_tokenizer.get());
Do we want to use  { } to control the scope of this explicitly?
Comment 18 Joseph Pecoraro 2010-05-06 18:22:35 PDT
>  > +          m_htmlTokenizer = asHTMLTokenizer(tokenizer);
> 
>  > +      ASSERT(tokenizer->isHTMLTokenizer());
>
> I think it would be better to make this a virtual function and get rid of the static cast.

Done.


> WebCore/dom/Document.cpp:1984
>  +      SynchronousHTMLTokenizerGuard tokenizerGuard(m_tokenizer.get());
> Do we want to use  { } to control the scope of this explicitly?

Done.


> >  +  var immediateElementCount, delayedElementCount;
>
> I'm not sure this does what you think it does.

I made my test a manual test. Maciej mentioned it would be okay and I confirmed that Adam would still be okay with it. So, this is no longer necessary, but I can explain it.

This makes two global variables. That way they can be used in shouldBe() which uses eval() with the global scope and so these variables would need to be in the global scope.
Comment 19 Joseph Pecoraro 2010-05-07 08:45:47 PDT
Committed r58950
	M	WebCore/dom/Tokenizer.h
	M	WebCore/dom/Document.cpp
	M	WebCore/ChangeLog
	M	WebCore/html/HTMLTokenizer.h
	A	WebCore/manual-tests/dom/document-write-synchronous-after-page-load.html
r58950 = cb3dabac1b45534c1fac494a3de74c9a141439a4 (refs/remotes/trunk)
http://trac.webkit.org/changeset/58950
Comment 20 Alexey Proskuryakov 2010-05-13 17:05:57 PDT
<rdar://problem/7899368>

This has caused bug 39008.