Bug 42365 - HTML5 Parser: document.write in a asynchronous script which is specified to load before page finish blows away document
Summary: HTML5 Parser: document.write in a asynchronous script which is specified to l...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Adam Barth
URL: http://www.tianya.cn/publicforum/arti...
Keywords:
Depends on: 40745
Blocks: 41115
  Show dependency treegraph
 
Reported: 2010-07-15 06:39 PDT by Johnny(Jianning) Ding
Modified: 2010-07-23 15:14 PDT (History)
5 users (show)

See Also:


Attachments
html file of the test case (240 bytes, text/html)
2010-07-15 06:42 PDT, Johnny(Jianning) Ding
no flags Details
js file of the test case (39 bytes, application/x-javascript)
2010-07-15 06:43 PDT, Johnny(Jianning) Ding
no flags Details
Patch (13.31 KB, patch)
2010-07-22 14:23 PDT, Adam Barth
no flags Details | Formatted Diff | Diff
Patch (16.16 KB, patch)
2010-07-22 20:22 PDT, Adam Barth
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Johnny(Jianning) Ding 2010-07-15 06:39:58 PDT
How to reproduce this issue
1. Visit http://www.tianya.cn/publicforum/articleslist/0/funinfo.shtml
2. The normal content shows up for a split second
3. The normal content disappears, and a ads shows up instead

The code path of ads displayed in the page is: document.getElementById("googleAdsense").src = "http://pagead2.googlesyndication.com/pagead/show_ads.js".
The external script "show_ads.js" called document.write to write ads in the page. But since the external script was added before page finish, the author of web page expected that the document.write should be called before page finish, and that document.write should not below away the document.

Here I use a test case to reproduce this bug. there are two files (test_doc_write.html & test_doc_write.js) in this test case.

test_doc_write.html:
<body>
hello
<script id="tester"></script>
<script>
document.getElementById("tester").src = "test_doc_write.js";
</script>
<script>
document.write("<div>write sth later</div>");
</script>
</body>

test_doc_write.js:
document.write("<div>write sth</div>")

Run the test case in IE8, result is:
hello 
write sth
write sth later

Run the test case in FF, result is:
hello
write sth later
write sth

Run the test case in WebKit result is:
write sth

In WebCore, when script's src is changed, it put the script in a queue(in ScriptElementData::notifyFinished) and executes the script in asynchronously (in Document::executeScriptSoonTimerFired).
In this case, when executing the script, the parser was already finished and shut down, so function Document::write blew away the original doc, opened a new doc.
Comment 1 Johnny(Jianning) Ding 2010-07-15 06:42:39 PDT
Created attachment 61647 [details]
html file of the test case
Comment 2 Johnny(Jianning) Ding 2010-07-15 06:43:10 PDT
Created attachment 61648 [details]
js file of the test case
Comment 3 Johnny(Jianning) Ding 2010-07-15 07:11:02 PDT
When talking with Adam for this issue, he told me to read a related W3C bug. (http://www.w3.org/Bugs/Public/show_bug.cgi?id=9767)

In the comment#10 of that bug, Ian provided a solution: "It would need a flag on the <script> element set when the element is added to the document, based on
whether the parser is on the stack, and then for the script execution to set a
similar flag".
According to Ian's solution, if script is added when document's parser is on the stack, which means the author of the webpage wants to execute the script in current document, and the document.write in that script should not blow way the document. This solution looks reasonable for me.

My thought to fix this bug is
1.Add a flag in ScriptElementData to indicate whether the script is added when document's parser is on the stack. (the ScriptElementData::m_createByParser is right flag to use?)
2. in Document.write, if a script is running (check both HTMLDocumentParser::m_scriptRunner and Document::m_scriptsToExecuteSoon), and if the script was added when document's parser was on the stack, ignore this document.write. Otherwise do write.
Comment 4 Eric Seidel (no email) 2010-07-15 11:30:46 PDT
I think this is a duplicate of bug 40745, no?
Comment 5 Adam Barth 2010-07-15 12:55:48 PDT
(In reply to comment #4)
> I think this is a duplicate of bug 40745, no?

Unclear.  There are a bunch of different cases here.  I fixed the cases in Bug 40745 to the extent that the referenced sites work properly.  In the end, we're going to them a the same time by updating to what the spec says to do.
Comment 6 Johnny(Jianning) Ding 2010-07-16 04:59:17 PDT
Adam's r63165: <http://trac.webkit.org/changeset/63165> already fixed the issue on http://www.tianya.cn/publicforum/articleslist/0/funinfo.shtml.

But there is a race condition. if "show_ads.js" is the last resource of the main doc, then parser will be closed before executing the "show_ads.js". Then when executing "show_ads.js", document.write will blow away the document. (Like the test case I gave in this issue)
Comment 7 Tony Gentilcore 2010-07-22 10:24:56 PDT
(In reply to comment #6)
> Adam's r63165: <http://trac.webkit.org/changeset/63165> already fixed the issue on http://www.tianya.cn/publicforum/articleslist/0/funinfo.shtml.
> 
> But there is a race condition. if "show_ads.js" is the last resource of the main doc, then parser will be closed before executing the "show_ads.js". Then when executing "show_ads.js", document.write will blow away the document. (Like the test case I gave in this issue)

Here's another site that repros:
https://www.delta.com/booking/searchFlights.do?displayTripType=multicity

This appears to be the trigger that adds the async script:
<script type="text/javascript" language="JavaScript"> 
function embedChat(){
	var chatScript = document.createElement('script');
	chatScript.type = 'text/javascript';
	chatScript.src = "https://kanachat.delta.com/tracker/vtc.php?orgid=1044339&ichannelid=Y91zET1044345";
	get("online_chat").appendChild(chatScript);
}
addEvent(window, 'load', embedChat, false); 
</script>
Comment 8 Adam Barth 2010-07-22 14:23:22 PDT
Created attachment 62338 [details]
Patch
Comment 9 Eric Seidel (no email) 2010-07-22 16:05:49 PDT
Comment on attachment 62338 [details]
Patch

I'm having trouble understanding this patch.

LayoutTests/http/tests/misc/write-from-dom-script.html:9
 +  script.src = "data:text/javascript,document.write('FAIL');document.close();";
It's sad that this script would "PASS" if your script had an error.  Would be better if the script set some global that was also checked and could cause FAIL.

LayoutTests/http/tests/misc/write-while-waiting.html:8
 +  setTimeout("document.write('PASS');document.close();", 100);
Would be nice if you could explain in the ChangeLog why this is right?  Maybe you did and I didn't understand.

WebCore/dom/Document.cpp:366
 +      , m_isWriteNeutralised(false)
It's sad that this is on Document and not DocumentWriter or something else.  Document is such a dumping ground.

WebCore/dom/Document.cpp:2020
 +      bool insertionPointIsUndefined = !m_parser || !m_parser->hasInsertionPoint();
This would have been more clear (to me) as:
bool hasInsertionPoint = m_parser && m_parser->hasInsertionPoint();

WebCore/dom/Document.cpp:2023
 +      if (insertionPointIsUndefined && isWriteNeutralised())
Then this becomes !hasInsertionPoint

WebCore/dom/Document.h:1005
 +      bool isWriteNeutralised() const { return m_isWriteNeutralised; }
I wish the spec called this "writeDisabled", Neutralized is a poor name IMO.
Hard to spell, hard to type, hard for our numerous non-english speakers to understand.

WebCore/dom/DocumentParser.h:44
 +      virtual bool hasInsertionPoint() { return true; }
So what does this mean when true?  Who should override it?  If it's true, you'll always get an insert call?

WebCore/dom/ScriptElement.cpp:59
 +      // If the element's Document has an active parser, and the parser's script
This comment makes it seem like this should not go inside ScriptElement.

WebCore/dom/ScriptElement.cpp:61
 +      // "parser-inserted" flag set, the user agent must set the element's
Or maybe that the Document should be notified that the script was inserted and it should make a decision to nueter itself or not.

WebCore/dom/ScriptElement.cpp: 
 +  static inline bool useHTML5Parser(Document* document)
This was previous dead code?

WebCore/dom/ScriptElement.cpp:203
 +          // flag was set as being itself "write-neutralised". Let neutralised doc
God this is an awful name.  I need to complain to Hixie.
Done: http://www.w3.org/Bugs/Public/show_bug.cgi?id=10228

Btw, my spelling corrector seems to think it's "neutralized" not "neutralised".  Yet another reason why that name needs to go.


WebCore/html/HTMLDocumentParser.cpp:218
 +      if (m_scriptRunner && m_scriptRunner->inScriptExecution())
This needs a comment to explain why.  It's not clear why "inScirptExecution" means there is an insertion point.  Why would that even be needed?  Can't we check m_input?

WebCore/html/HTMLDocumentParser.cpp:220
 +      // FIXME: Checking whether we've seen end of file isn't quite right when
I don't undersand this FIXME.  Can you explain better?
Comment 10 Eric Seidel (no email) 2010-07-22 16:06:36 PDT
I think I can guess why, but can we document somewhere why the LegacyHTMLDocumentParser doesn't want a isWriteNeuteredOrWhateverTheBrokenNameInTheSpecIs() function?
Comment 11 Eric Seidel (no email) 2010-07-22 16:39:46 PDT
Comment on attachment 62338 [details]
Patch

WebCore/dom/ScriptElement.cpp:65
 +          data.setIsWriteNeutralised(true);
Once write neutered, always neutered.  Is this the idea?  Or should this flag get reset if the script element is moved to a new document before it's fully loaded?  Not sure, sounds like a complicated edge case.
Comment 12 Adam Barth 2010-07-22 18:08:30 PDT
> I'm having trouble understanding this patch.

Eric and I discussed the patch verbally.

> LayoutTests/http/tests/misc/write-from-dom-script.html:9
>  +  script.src = "data:text/javascript,document.write('FAIL');document.close();";
> It's sad that this script would "PASS" if your script had an error.  Would be better if the script set some global that was also checked and could cause FAIL.

Fixed.

> LayoutTests/http/tests/misc/write-while-waiting.html:8
>  +  setTimeout("document.write('PASS');document.close();", 100);
> Would be nice if you could explain in the ChangeLog why this is right?  Maybe you did and I didn't understand.

I added more details to the ChangeLog.  Ian's trying to narrowly address the compatibility problem.

> WebCore/dom/Document.cpp:366
>  +      , m_isWriteNeutralised(false)
> It's sad that this is on Document and not DocumentWriter or something else.  Document is such a dumping ground.

Of all the state on document, this one is very appropriate.  The spec stores the state on Document and it's read by Document::write, which seems to really belong on document.

> WebCore/dom/Document.cpp:2020
>  +      bool insertionPointIsUndefined = !m_parser || !m_parser->hasInsertionPoint();
> This would have been more clear (to me) as:
> bool hasInsertionPoint = m_parser && m_parser->hasInsertionPoint();

Done.

> WebCore/dom/Document.cpp:2023
>  +      if (insertionPointIsUndefined && isWriteNeutralised())
> Then this becomes !hasInsertionPoint

Yep.

> WebCore/dom/Document.h:1005
>  +      bool isWriteNeutralised() const { return m_isWriteNeutralised; }
> I wish the spec called this "writeDisabled", Neutralized is a poor name IMO.
> Hard to spell, hard to type, hard for our numerous non-english speakers to understand.

Done.  I chose the name to match the spec.  It still has a hyperlink to the spec, so folks should be able to figure it out.

> WebCore/dom/DocumentParser.h:44
>  +      virtual bool hasInsertionPoint() { return true; }
> So what does this mean when true?  Who should override it?  If it's true, you'll always get an insert call?

We discussed this over the phone.  The default should be false, but that messes up some SVG font code.  I'll make another patch that changes the default to false and fixes the SVG font code to use the right APIs.

> WebCore/dom/ScriptElement.cpp:59
>  +      // If the element's Document has an active parser, and the parser's script
> This comment makes it seem like this should not go inside ScriptElement.

I didn't understand this comment.  It's in the "script element" section of the spec and it's manipulating some part of the script element's state...  Putting the code here seems to make sense.

> WebCore/dom/ScriptElement.cpp:61
>  +      // "parser-inserted" flag set, the user agent must set the element's
> Or maybe that the Document should be notified that the script was inserted and it should make a decision to nueter itself or not.

I think this comment stems from your confusing about how the dataflow works.  The document does get notified, but when the script runs.  Here we're remembering what life was like when we were inserted into the document.

> WebCore/dom/ScriptElement.cpp: 
>  +  static inline bool useHTML5Parser(Document* document)
> This was previous dead code?

Yes.  Technically this change is unrelated, but I just saw it and removed it.

> WebCore/dom/ScriptElement.cpp:203
>  +          // flag was set as being itself "write-neutralised". Let neutralised doc
> God this is an awful name.  I need to complain to Hixie.
> Done: http://www.w3.org/Bugs/Public/show_bug.cgi?id=10228
> 
> Btw, my spelling corrector seems to think it's "neutralized" not "neutralised".  Yet another reason why that name needs to go.

The name is gone, from the code at least.

> WebCore/html/HTMLDocumentParser.cpp:218
>  +      if (m_scriptRunner && m_scriptRunner->inScriptExecution())
> This needs a comment to explain why.  It's not clear why "inScirptExecution" means there is an insertion point.  Why would that even be needed?  Can't we check m_input?

I've changed this to get the information from m_input.  I've also renamed HTMLScriptRunner::inScriptExecution to something more informative.

> WebCore/html/HTMLDocumentParser.cpp:220
>  +      // FIXME: Checking whether we've seen end of file isn't quite right when
> I don't undersand this FIXME.  Can you explain better?

I've removed it.
Comment 13 Adam Barth 2010-07-22 20:22:32 PDT
Created attachment 62376 [details]
Patch
Comment 14 Eric Seidel (no email) 2010-07-23 13:53:01 PDT
Comment on attachment 62376 [details]
Patch

WebCore/dom/Document.cpp:2021
 +      // "write-neutralised" flag set, then abort these steps.
Doesn't match the code, but matches the spec, so OK.

WebCore/dom/Document.cpp:2026
 +      if (!hasInsertionPoint)
I thought you said you had to change this back to being if (m_parser)?

WebCore/dom/Document.cpp:366
 +      , m_writeDisabled(false)
I hope there is just the one constructor...

WebCore/dom/ScriptElement.cpp:65
 +          data.setWriteDisabled(true);
We'll never set this back to "false" if we move into a different document at a later time.  Should we?

WebCore/dom/ScriptElement.cpp:207
 +              m_element->document()->setWriteDisabled(true);
This probably would have been cleaner as a stack-object (like you're so fond of). :)

WebCore/html/HTMLInputStream.h:72
 +              // FIXME: Somehow we need to understand the difference between
I thought you said this FIXME could go away if we moved the check back to m_parser instead?  What are the concequences of this FIXME?  What are we doing wrong now?  How bad is it?

WebCore/dom/DocumentParser.h:44
 +      virtual bool hasInsertionPoint() { return true; }
You mentioned having this return false?
Comment 15 Adam Barth 2010-07-23 14:00:06 PDT
>  +      // "write-neutralised" flag set, then abort these steps.
> Doesn't match the code, but matches the spec, so OK.

These are just quotes from the spec.

> WebCore/dom/Document.cpp:2026
>  +      if (!hasInsertionPoint)
> I thought you said you had to change this back to being if (m_parser)?

That turned out not to work because of documents created by document.open().  At a basic level, we need to distinguish between the two cases in order to get things 100% correct.

> I hope there is just the one constructor...

I believe so.

> We'll never set this back to "false" if we move into a different document at a later time.  Should we?

I'm not sure how to write a test case to observe the difference since scripts only run once.  Maybe if it gets moved between the first attack and when it runs?  I think our behavior is correct w.r.t. the spec in that case anyway since the spec never changes this flag.

>  +              m_element->document()->setWriteDisabled(true);
> This probably would have been cleaner as a stack-object (like you're so fond of). :)

Maybe.  I can change it if you like, but I tried to keep my zest for RAIIs contained.

>  +              // FIXME: Somehow we need to understand the difference between
> I thought you said this FIXME could go away if we moved the check back to m_parser instead?  What are the concequences of this FIXME?  What are we doing wrong now?  How bad is it?

That turned out not to work.  The consequence is if the async script runs before we've gotten EOF from the network, we'll document.write into the network byte stream at the point where we're waiting for bytes from the network.  This matches our original behavior.  We can polish this further in a followup patch since that case is much more obscure than the bug we're trying to fix (and the behavior matches our old behavior).

> WebCore/dom/DocumentParser.h:44
>  +      virtual bool hasInsertionPoint() { return true; }
> You mentioned having this return false?

I'll do that in a followup patch.  It involves donking around with the SVG font code, so I didn't want to do it in this patch.
Comment 16 Eric Seidel (no email) 2010-07-23 14:36:46 PDT
Comment on attachment 62376 [details]
Patch

I think your zest for RAIIs is totally justified in this case, but it's not a big deal.  Just easier to get this "wrong" in any second usage were there to be one.
Comment 17 Eric Seidel (no email) 2010-07-23 14:36:54 PDT
Thank you for the replies.
Comment 18 Adam Barth 2010-07-23 14:37:45 PDT
Comment on attachment 62376 [details]
Patch

Thanks for the detailed review.  :)
Comment 19 WebKit Commit Bot 2010-07-23 15:14:18 PDT
Comment on attachment 62376 [details]
Patch

Clearing flags on attachment: 62376

Committed r63998: <http://trac.webkit.org/changeset/63998>
Comment 20 WebKit Commit Bot 2010-07-23 15:14:24 PDT
All reviewed patches have been landed.  Closing bug.