Bug 38146

Summary: document.write is not synchronous after page load
Product: WebKit Reporter: Joseph Pecoraro <joepeck>
Component: DOMAssignee: Joseph Pecoraro <joepeck>
Status: RESOLVED FIXED    
Severity: Normal CC: ap, ddkilzer, eric, ian, joepeck, mjs, ml, simon.fraser
Priority: P2 Keywords: InRadar
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Attachments:
Description Flags
[TEST] Manual Test - Beware this may crash some browsers.
none
[PATCH] Force Document.write to be Synchronous
mjs: review+
[PATCH] Addressed Comments - Added Test
none
[PATCH] Fixed a Few Style Issues
abarth: review-
[PATCH] Fixed Comments - Added RAII Class abarth: review+, abarth: commit-queue-

Joseph Pecoraro
Reported 2010-04-26 15:52:40 PDT
HTML5 describes document.write as: http://www.whatwg.org/specs/web-apps/current-work/multipage/apis-in-html-documents.html#document.write() Part 5 describes processing the characters one at a time until reaching the insertion point. This is not 100% followed by WebKit. WebKit can break early in the HTMLTokenizer and put the remaining work on a timer. This may result in incorrect results if running DOM operations immediately after a document.write. This is difficult to test. The only idea I have had is processing a very large string to hit the 500ms limit. This is not future proof, as there could always be a CPU fast enough to complete the write() in less than the time limit. Also, a large enough string may hang the browser. Is it desired that we follow the spec here? Other Browsers: Firefox 3.6 is synchronous or hangs. Chrome 5.0.342.9 is synchronous or hangs. Safari 4.0.4 is like WebKit and breaks after a long enough time. Example of Issue: var doc = document.implementation.createHTMLDocument(); doc.write( sufficientlyLongString ); doc.getElementsByTagName("*").length; // DOM Operation may have undefined results WebKit will eventually complete the parsing, but it just fills in the DOM in the background. There is no way to know when it is complete.
Attachments
[TEST] Manual Test - Beware this may crash some browsers. (1.28 KB, text/html)
2010-04-26 16:16 PDT, Joseph Pecoraro
no flags
[PATCH] Force Document.write to be Synchronous (2.50 KB, patch)
2010-04-26 16:28 PDT, Joseph Pecoraro
mjs: review+
[PATCH] Addressed Comments - Added Test (5.92 KB, patch)
2010-05-03 09:46 PDT, Joseph Pecoraro
no flags
[PATCH] Fixed a Few Style Issues (5.92 KB, patch)
2010-05-03 10:01 PDT, Joseph Pecoraro
abarth: review-
[PATCH] Fixed Comments - Added RAII Class (7.11 KB, patch)
2010-05-05 16:19 PDT, Joseph Pecoraro
abarth: review+
abarth: commit-queue-
Joseph Pecoraro
Comment 1 2010-04-26 16:16:45 PDT
Created attachment 54344 [details] [TEST] Manual Test - Beware this may crash some browsers. This test builds a very long string and passes it to document.write, then does some DOM queries immediately after the write call. Sample output of WebKit shows the undesired behavior: Sample output: > 49987 nodes immediately after the call (NOTE this number may vary) > 65539 nodes 100 milliseconds later > 65539 nodes 1 second later
Joseph Pecoraro
Comment 2 2010-04-26 16:28:48 PDT
Created attachment 54345 [details] [PATCH] Force Document.write to be Synchronous Parts 1-4 of the HTML5 spec still apply, and document.write with pending scripts will still break. However, if it gets past that point in HTMLTokenizer::write then the data in document.write will be parsed synchronously.
Joseph Pecoraro
Comment 3 2010-04-28 22:31:59 PDT
Anyone want to comment on this? I'll try asking some people in #webkit tomorrow.
David Kilzer (:ddkilzer)
Comment 4 2010-04-30 10:22:51 PDT
(In reply to comment #3) > Anyone want to comment on this? I'll try asking some people in #webkit > tomorrow. Added Hixie in case he would like to comment.
Joseph Pecoraro
Comment 5 2010-05-01 14:11:35 PDT
It was asked that I do some more testing and impact research: > I think matching other browsers is cool and a good enough reason > to do something ... but more testing / compat impact info would be > really valuable Here goes: How this patch affects behavior: -------------------------------- 1. Typical page load times should be unaffected. This is because any document.write which happens in a <script> on page load time will increment m_executingScript and cause the document.write to be synchronous anyways. 2. Entire pages written with document.write (where it writes over the existing page) will be affected. I haven't seen usage of this, and it appears to be discouraged at all of the above articles. 3. However, in those cases we will match the spec and other browsers. I have tested on Mac OS X 10.6 - Firefox 3.6, Chrome 5.0.342.9, Opera 10.53 Test Cases: ----------- I tested the state of the tokenizer right before it makes its decision to yield or not. I show below how the state of the tokenizer will change, and whether or not that will change behavior (as I mentioned above). 1. Inline <script> document.write on page load state.loadingExtScript(): false state.forceSynchronous(): false [will become true, but no change] m_executingScript : true allowedYield : false 2. External <script src="..."> with document.write on page load state.loadingExtScript(): true state.forceSynchronous(): false [will become true, but no change] m_executingScript : true allowedYield : false 3. document.write after page is done loading state.loadingExtScript(): false/true state.forceSynchronous(): false [will become true, changes behavior] m_executingScript : true allowedYield : false Case 3 is what I mention above. Below is some research I did to see how this could affect things. For instance the "Public perception" or "what the web is saying" about document.write: Top Google Hits: ---------------- - http://javascript.about.com/library/blwrite.htm Advises use only while the page is loading, because otherwise it writes over the existing page content. Also, document.write doesn't apply to XML and therefore XHTML. - http://www.sitepoint.com/blogs/2007/07/11/insert-in-place-without-documentwrite/ Advises DOM creation instead of document.write. > And there you have it – a simple but elegant way of inserting > content in place, removing the last vestige of need for > document.write! - http://java-programming.suite101.com/article.cfm/javascript_documentwrite_tutorial 2008 article describing how to use it and why it is useful. This article does not mention any alternatives. It does warn that usage should be restricted to page loading: > The document.write command must be carried out during the loading of > the page. So, if it is attached to any event that executes after the > page has loaded, then the whole page will be replaced with the > contents of the document.write command. However you can still find recent articles proposing document.write as useful: - http://www.joshstauffer.com/ways-to-use-document-write/ > The document.write command is a standard, yet powerful, JavaScript > command for writing output to a page. In this article I’ve outline > 3 ways to use document.write statements. Common document.write usage across the Web: ------------------------------------------- - Google Analytics http://code.google.com/apis/analytics/docs/tracking/gaTrackingOverview.html The "traditional" snippet used across the web for free analytics uses document.write. However its usage should be unaffected by this patch. Its also worth noting that the newer asynchronous tracking snippet is described as "an improved way", and this does not use document.write. Summary: -------- I don't see anyone recommending usage of document.write after page load. Its possible that I just haven't seen it. Its arguably more likely that document.implementation.createHTMLDocument and write() on the newly created document could be used on some sites, but "document.implementation" is not available on all browsers and in those cases we would start matching other browser's behavior and give the results users would expect by matching WebKit's own behavior of document.write on page loads.
Joseph Pecoraro
Comment 6 2010-05-01 14:17:10 PDT
Correction to my above comment. In Test Case 3: m_executingScript should be false. A copy & paste error on my part. This flag was not set for me when I executed a script in window.onload.
Maciej Stachowiak
Comment 7 2010-05-01 17:41:05 PDT
Comment on attachment 54345 [details] [PATCH] Force Document.write to be Synchronous Code change looks good, but I have two suggestions for improving this patch: 1) it would probably be good to mention the external script exception to synchronous behavior in the ChangeLog 2) it would be good to add a test case, even if it only fails part of the time in the bad case, or even if it is a manual test r- for now for the lack of test. I'll gladly r+ if the points above are addressed.
Eric Seidel (no email)
Comment 8 2010-05-02 19:32:22 PDT
Attachment 54345 [details] was posted by a committer and has review+, assigning to Joseph Pecoraro for commit.
Joseph Pecoraro
Comment 9 2010-05-03 09:46:19 PDT
Created attachment 54934 [details] [PATCH] Addressed Comments - Added Test > 1) it would probably be good to mention the external script exception to > synchronous behavior in the ChangeLog Sure, I've added a comment to the code as well as the ChangeLog. > 2) it would be good to add a test case, even if it only fails part of the time > in the bad case, or even if it is a manual test I added a LayoutTest in fast/dom since I'm told "fast" is a misnomer. This isn't really a fast test.
Joseph Pecoraro
Comment 10 2010-05-03 09:47:21 PDT
Updated bug title.
mitz
Comment 11 2010-05-03 09:48:25 PDT
Comment on attachment 54934 [details] [PATCH] Addressed Comments - Added Test > + HTMLTokenizer* htmlTokenizer = (m_tokenizer->isHTMLTokenizer() ? static_cast<HTMLTokenizer*>(m_tokenizer.get()) : 0); Redundant parentheses.
Joseph Pecoraro
Comment 12 2010-05-03 10:01:52 PDT
Created attachment 54935 [details] [PATCH] Fixed a Few Style Issues
Joseph Pecoraro
Comment 13 2010-05-04 15:41:13 PDT
Maciej I added a new patch addressing your comments (add test and comments). Just noticed you weren't CC'd.
Adam Barth
Comment 14 2010-05-05 10:15:11 PDT
Comment on attachment 54935 [details] [PATCH] Fixed a Few Style Issues WebCore/dom/Document.cpp:1963 + HTMLTokenizer* htmlTokenizer = (m_tokenizer->isHTMLTokenizer() ? static_cast<HTMLTokenizer*>(m_tokenizer.get()) : 0); No outer parenthesis needed. WebCore/dom/Document.cpp:1963 + HTMLTokenizer* htmlTokenizer = (m_tokenizer->isHTMLTokenizer() ? static_cast<HTMLTokenizer*>(m_tokenizer.get()) : 0); This seems like it should be an asHTMLTokenzer method of m_tokenizer. Does this pattern occur elsewhere in this file? RTTI is usually a bad idea. WebCore/dom/Document.cpp:1972 + htmlTokenizer->setForceSynchronous(savedForceSynchronous); Can we do this with a RAII class? The implementation you have is error prone.
Joseph Pecoraro
Comment 15 2010-05-05 16:19:08 PDT
Created attachment 55169 [details] [PATCH] Fixed Comments - Added RAII Class > No outer parenthesis needed. Arg, I attached the same patch for patch 3 instead of the one fixing style issues. Thanks. > This seems like it should be an asHTMLTokenzer method of m_tokenizer. Does > this pattern occur elsewhere in this file? RTTI is usually a bad idea. I'm not sure how useful this will be. I still need to do the same thing I am doing, checking if it is an HTMLTokenizer and casting it. But I haven't had much experience using this idiom. Document.cpp uses the following pattern a lot, which is similar: if (node->hasLocalName(someTag)) sheet = static_cast<HTMLSomeElement*>(n)->... There is one other use of `isHTMLTokenizer()`, in HTMLFormControlElement.cpp. > WebCore/dom/Document.cpp:1972 > + htmlTokenizer->setForceSynchronous(savedForceSynchronous); > Can we do this with a RAII class? The implementation you have is error prone. Added this in the recent patch. I'd appreciate feedback on the name to make it clearer what this is doing. I'm still worried about this test being too long. It takes ~7 seconds on my machine. If anyone else thinks that is too long I'll gladly make this a manual test.
Joseph Pecoraro
Comment 16 2010-05-05 16:22:57 PDT
> ~7 seconds on my machine I should clarify that this is In a debug build. It should be much faster in a Release build.
Adam Barth
Comment 17 2010-05-06 16:50:47 PDT
Comment on attachment 55169 [details] [PATCH] Fixed Comments - Added RAII Class Looks good. A 7 second test is somewhat worrying. Please address the comments below before landing. LayoutTests/fast/dom/script-tests/document-write-synchronous.js:3 + var immediateElementCount, delayedElementCount; I'm not sure this does what you think it does. WebCore/dom/Document.cpp:217 + m_htmlTokenizer = asHTMLTokenizer(tokenizer); Oh, I meant to add asHTMLTokenizer as a virtual function on Tokenizer to avoid the static_cast. WebCore/html/HTMLTokenizer.h:436 + ASSERT(tokenizer->isHTMLTokenizer()); Because write only works on HTML documents? I think it would be better to make this a virtual function and get rid of the static cast. WebCore/dom/Document.cpp:1984 + SynchronousHTMLTokenizerGuard tokenizerGuard(m_tokenizer.get()); Do we want to use { } to control the scope of this explicitly?
Joseph Pecoraro
Comment 18 2010-05-06 18:22:35 PDT
> > + m_htmlTokenizer = asHTMLTokenizer(tokenizer); > > > + ASSERT(tokenizer->isHTMLTokenizer()); > > I think it would be better to make this a virtual function and get rid of the static cast. Done. > WebCore/dom/Document.cpp:1984 > + SynchronousHTMLTokenizerGuard tokenizerGuard(m_tokenizer.get()); > Do we want to use { } to control the scope of this explicitly? Done. > > + var immediateElementCount, delayedElementCount; > > I'm not sure this does what you think it does. I made my test a manual test. Maciej mentioned it would be okay and I confirmed that Adam would still be okay with it. So, this is no longer necessary, but I can explain it. This makes two global variables. That way they can be used in shouldBe() which uses eval() with the global scope and so these variables would need to be in the global scope.
Joseph Pecoraro
Comment 19 2010-05-07 08:45:47 PDT
Committed r58950 M WebCore/dom/Tokenizer.h M WebCore/dom/Document.cpp M WebCore/ChangeLog M WebCore/html/HTMLTokenizer.h A WebCore/manual-tests/dom/document-write-synchronous-after-page-load.html r58950 = cb3dabac1b45534c1fac494a3de74c9a141439a4 (refs/remotes/trunk) http://trac.webkit.org/changeset/58950
Alexey Proskuryakov
Comment 20 2010-05-13 17:05:57 PDT
Note You need to log in before you can comment on or make changes to this bug.