RESOLVED FIXED 42909
REGRESSION (r61285): <script /> parses differently in HTML5
https://bugs.webkit.org/show_bug.cgi?id=42909
Summary REGRESSION (r61285): <script /> parses differently in HTML5
Mark Rowe (bdash)
Reported 2010-07-23 14:19:02 PDT
In <rdar://problem/8228000> it was reported to us that an internal Apple site displays as an almost empty page with the HTML5 parser enabled. This happens because the page contains markup similar to the following: <script src="data:text/javascript," type="text/javascript" /> <p>Hello world</p> In Safari 5 this displays as “Hello world”. In tip of tree this displays as a blank page. I understand that this markup is not well-formed due the missing closing tag on the script element, but this seems to be a change in behavior that could have a serious backwards-compatibility impact on existing WebKit-only content.
Attachments
Patch (12.36 KB, patch)
2010-09-22 13:40 PDT, Andy Estes
no flags
Patch (14.78 KB, patch)
2010-09-22 18:00 PDT, Andy Estes
abarth: review+
abarth: commit-queue-
Adam Barth
Comment 1 2010-07-23 14:35:25 PDT
Yep. This is on our list of things we expect to see compat problems with: https://docs.google.com/document/edit?id=1as5xYjyMSCph4960iz0-Kb7hZKf_L6f2vts57NMcVBI&hl=en So far, we haven't seen any examples of this on the public Internet. As far as I know, WebKit is the only rendering engine with this quirk.
Alexey Proskuryakov
Comment 2 2010-07-24 17:26:53 PDT
I'm wondering if this is at all related to bug 42900 (not directly of course, but perhaps historically).
Adam Barth
Comment 3 2010-07-25 06:09:29 PDT
I'm sure if I understand the connection. Parsing HTML comments in script elements is somewhat tricky and takes up a bunch of states in the lexer.
Andy Estes
Comment 4 2010-09-22 13:40:25 PDT
Adam Barth
Comment 5 2010-09-22 13:46:17 PDT
Comment on attachment 68433 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=68433&action=review This looks good. One minor quibble. > WebCore/html/parser/HTMLTreeBuilder.cpp:2715 > + if (m_usePreHTML5ParserQuirks && token.selfClosing()) { > + processSelfClosingScriptTag(token); > + return true; > + } > + > processScriptStartTag(token); I would have expected the branch on the quirk to go after this line and then just process a fake end script tag. > WebCore/html/parser/HTMLTreeBuilder.cpp:2760 > + m_tree.insertScriptElement(token); > + m_originalInsertionMode = m_insertionMode; > + m_lastScriptElementStartLine = m_tokenizer->lineNumber(); > + setInsertionMode(TextMode); Can we share more code with processScriptStartTag ? > LayoutTests/fast/parser/pre-html5-parser-quirks.html:18 > +<iframe src="resources/pre-html5-parser-quirk-self-closing-script.html"></iframe> Can you add a test for the fragment mode too? Also, please add a test for self-closing script tags in the head element b/c that goes through a slightly different code path.
Alexey Proskuryakov
Comment 6 2010-09-22 14:13:54 PDT
I think that this needs more tests for what is considered to be self-closing tag. E.g. <script /> is, but <script/> isn't. What about other whitespace?
Andy Estes
Comment 7 2010-09-22 14:47:37 PDT
(In reply to comment #5) > (From update of attachment 68433 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=68433&action=review > > This looks good. One minor quibble. > > > WebCore/html/parser/HTMLTreeBuilder.cpp:2715 > > + if (m_usePreHTML5ParserQuirks && token.selfClosing()) { > > + processSelfClosingScriptTag(token); > > + return true; > > + } > > + > > processScriptStartTag(token); > > I would have expected the branch on the quirk to go after this line and then just process a fake end script tag. > > > WebCore/html/parser/HTMLTreeBuilder.cpp:2760 > > + m_tree.insertScriptElement(token); > > + m_originalInsertionMode = m_insertionMode; > > + m_lastScriptElementStartLine = m_tokenizer->lineNumber(); > > + setInsertionMode(TextMode); > > Can we share more code with processScriptStartTag ? I didn't call processScriptStartTag() directly since it would leave the tokenizer in the ScriptDataState. I suppose I could set it back to DataState before inserting the fake end tag. Does that seem cleaner than writing a separate routine? > > > LayoutTests/fast/parser/pre-html5-parser-quirks.html:18 > > +<iframe src="resources/pre-html5-parser-quirk-self-closing-script.html"></iframe> > > Can you add a test for the fragment mode too? Also, please add a test for self-closing script tags in the head element b/c that goes through a slightly different code path. Will do.
Andy Estes
Comment 8 2010-09-22 14:52:08 PDT
(In reply to comment #6) > I think that this needs more tests for what is considered to be self-closing tag. E.g. <script /> is, but <script/> isn't. What about other whitespace? The tokenizer decides whether or not the script token is a self-closing tag, and applies the same rules it would for any other self-closing tag. My assumption is that self-closing tags in general are covered by pre-existing tokenizer tests. If it isn't, then they should be added, but this doesn't seem like the right place for such a test.
Adam Barth
Comment 9 2010-09-22 15:02:57 PDT
Comment on attachment 68433 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=68433&action=review >>> WebCore/html/parser/HTMLTreeBuilder.cpp:2715 >>> processScriptStartTag(token); >> >> I would have expected the branch on the quirk to go after this line and then just process a fake end script tag. > > I didn't call processScriptStartTag() directly since it would leave the tokenizer in the ScriptDataState. I suppose I could set it back to DataState before inserting the fake end tag. Does that seem cleaner than writing a separate routine? I see. The issue is that this code: http://trac.webkit.org/browser/trunk/WebCore/html/parser/HTMLTreeBuilder.cpp#L2197 doesn't set the tokenizer to the DataState because the tokenizer sets itself to the DataState. What I'd do is make that code explicitly set the tokenizer back to the data state and add an assert that we're only not already in the data state when m_usePreHTML5ParserQuirks is true. If we do that, then the rest of the machinery should do the right thing without much effort on our part.
Alexey Proskuryakov
Comment 10 2010-09-22 15:56:11 PDT
(In reply to comment #8) > The tokenizer decides whether or not the script token is a self-closing tag, and applies the same rules it would for any other self-closing tag. This is the new behavior - but in Safari 5, we treated <script/> and <script /> differently, according to my testing. Extending this quirk to <script/> should be a conscious change.
Adam Barth
Comment 11 2010-09-22 15:59:35 PDT
> This is the new behavior - but in Safari 5, we treated <script/> and <script /> differently, according to my testing. Extending this quirk to <script/> should be a conscious change. I'm all for testing, but I'd like to try the HTML5 tokenization rules for self-closing tags in the hopes that they're sufficient here. Changing them for this quirk is possible, of course, but less desirable.
Alexey Proskuryakov
Comment 12 2010-09-22 16:17:05 PDT
Sounds fine to me.
Andy Estes
Comment 13 2010-09-22 16:19:37 PDT
(In reply to comment #10) > (In reply to comment #8) > > The tokenizer decides whether or not the script token is a self-closing tag, and applies the same rules it would for any other self-closing tag. > > This is the new behavior - but in Safari 5, we treated <script/> and <script /> differently, according to my testing. Extending this quirk to <script/> should be a conscious change. Ohh I see. Can you describe the difference you are seeing? I tried this: <script src="data:text/javascript,alert('loaded script url')"/>PASS and this: <script src="data:text/javascript,alert('loaded script url')" />PASS In Safari 5, both cases displayed the alert and printed 'PASS'; in WebKit nightly, neither did. That indicates to me that the quirk should be extended to both cases, but maybe I'm missing something.
Alexey Proskuryakov
Comment 14 2010-09-22 16:23:15 PDT
These are different in Safari 5.0.2: <script/>alert(0)</script> <script />alert(0)</script>
Andy Estes
Comment 15 2010-09-22 16:26:54 PDT
(In reply to comment #14) > These are different in Safari 5.0.2: > <script/>alert(0)</script> > <script />alert(0)</script> Ahh, okay thanks. It'll be good to at least have that documented as a change in behavior.
Andy Estes
Comment 16 2010-09-22 18:00:13 PDT
Darin Adler
Comment 17 2010-09-22 18:29:41 PDT
Comment on attachment 68484 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=68484&action=review I think Adam might want to do the final round of review, but I do have one comment. > WebCore/html/parser/HTMLTreeBuilder.cpp:2220 > + m_tokenizer->setState(HTMLTokenizer::DataState); > + } I would write this as: ASSERT(m_tokenizer->state() == HTMLTokenizer::DataState || m_usePreHTML5ParserQuirks); m_tokenizer->setState(HTMLTokenizer::DataState); Also, I’d try to keep the comment shorter.
Adam Barth
Comment 18 2010-09-22 19:11:34 PDT
Comment on attachment 68484 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=68484&action=review >> WebCore/html/parser/HTMLTreeBuilder.cpp:2220 >> + } > > I would write this as: > > ASSERT(m_tokenizer->state() == HTMLTokenizer::DataState || m_usePreHTML5ParserQuirks); > m_tokenizer->setState(HTMLTokenizer::DataState); > > Also, I’d try to keep the comment shorter. Yep. > WebCore/html/parser/HTMLTreeBuilder.cpp:2724 > + if (m_usePreHTML5ParserQuirks && token.selfClosing()) > + processFakeEndTag(scriptTag); Beautiful.
Andy Estes
Comment 19 2010-09-22 19:16:27 PDT
Thanks darin and abarth!
Andy Estes
Comment 20 2010-09-22 19:39:40 PDT
Note You need to log in before you can comment on or make changes to this bug.