In <rdar://problem/8228000> it was reported to us that an internal Apple site displays as an almost empty page with the HTML5 parser enabled. This happens because the page contains markup similar to the following: <script src="data:text/javascript," type="text/javascript" /> <p>Hello world</p> In Safari 5 this displays as “Hello world”. In tip of tree this displays as a blank page. I understand that this markup is not well-formed due the missing closing tag on the script element, but this seems to be a change in behavior that could have a serious backwards-compatibility impact on existing WebKit-only content.
Yep. This is on our list of things we expect to see compat problems with: https://docs.google.com/document/edit?id=1as5xYjyMSCph4960iz0-Kb7hZKf_L6f2vts57NMcVBI&hl=en So far, we haven't seen any examples of this on the public Internet. As far as I know, WebKit is the only rendering engine with this quirk.
I'm wondering if this is at all related to bug 42900 (not directly of course, but perhaps historically).
I'm sure if I understand the connection. Parsing HTML comments in script elements is somewhat tricky and takes up a bunch of states in the lexer.
Created attachment 68433 [details] Patch
Comment on attachment 68433 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=68433&action=review This looks good. One minor quibble. > WebCore/html/parser/HTMLTreeBuilder.cpp:2715 > + if (m_usePreHTML5ParserQuirks && token.selfClosing()) { > + processSelfClosingScriptTag(token); > + return true; > + } > + > processScriptStartTag(token); I would have expected the branch on the quirk to go after this line and then just process a fake end script tag. > WebCore/html/parser/HTMLTreeBuilder.cpp:2760 > + m_tree.insertScriptElement(token); > + m_originalInsertionMode = m_insertionMode; > + m_lastScriptElementStartLine = m_tokenizer->lineNumber(); > + setInsertionMode(TextMode); Can we share more code with processScriptStartTag ? > LayoutTests/fast/parser/pre-html5-parser-quirks.html:18 > +<iframe src="resources/pre-html5-parser-quirk-self-closing-script.html"></iframe> Can you add a test for the fragment mode too? Also, please add a test for self-closing script tags in the head element b/c that goes through a slightly different code path.
I think that this needs more tests for what is considered to be self-closing tag. E.g. <script /> is, but <script/> isn't. What about other whitespace?
(In reply to comment #5) > (From update of attachment 68433 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=68433&action=review > > This looks good. One minor quibble. > > > WebCore/html/parser/HTMLTreeBuilder.cpp:2715 > > + if (m_usePreHTML5ParserQuirks && token.selfClosing()) { > > + processSelfClosingScriptTag(token); > > + return true; > > + } > > + > > processScriptStartTag(token); > > I would have expected the branch on the quirk to go after this line and then just process a fake end script tag. > > > WebCore/html/parser/HTMLTreeBuilder.cpp:2760 > > + m_tree.insertScriptElement(token); > > + m_originalInsertionMode = m_insertionMode; > > + m_lastScriptElementStartLine = m_tokenizer->lineNumber(); > > + setInsertionMode(TextMode); > > Can we share more code with processScriptStartTag ? I didn't call processScriptStartTag() directly since it would leave the tokenizer in the ScriptDataState. I suppose I could set it back to DataState before inserting the fake end tag. Does that seem cleaner than writing a separate routine? > > > LayoutTests/fast/parser/pre-html5-parser-quirks.html:18 > > +<iframe src="resources/pre-html5-parser-quirk-self-closing-script.html"></iframe> > > Can you add a test for the fragment mode too? Also, please add a test for self-closing script tags in the head element b/c that goes through a slightly different code path. Will do.
(In reply to comment #6) > I think that this needs more tests for what is considered to be self-closing tag. E.g. <script /> is, but <script/> isn't. What about other whitespace? The tokenizer decides whether or not the script token is a self-closing tag, and applies the same rules it would for any other self-closing tag. My assumption is that self-closing tags in general are covered by pre-existing tokenizer tests. If it isn't, then they should be added, but this doesn't seem like the right place for such a test.
Comment on attachment 68433 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=68433&action=review >>> WebCore/html/parser/HTMLTreeBuilder.cpp:2715 >>> processScriptStartTag(token); >> >> I would have expected the branch on the quirk to go after this line and then just process a fake end script tag. > > I didn't call processScriptStartTag() directly since it would leave the tokenizer in the ScriptDataState. I suppose I could set it back to DataState before inserting the fake end tag. Does that seem cleaner than writing a separate routine? I see. The issue is that this code: http://trac.webkit.org/browser/trunk/WebCore/html/parser/HTMLTreeBuilder.cpp#L2197 doesn't set the tokenizer to the DataState because the tokenizer sets itself to the DataState. What I'd do is make that code explicitly set the tokenizer back to the data state and add an assert that we're only not already in the data state when m_usePreHTML5ParserQuirks is true. If we do that, then the rest of the machinery should do the right thing without much effort on our part.
(In reply to comment #8) > The tokenizer decides whether or not the script token is a self-closing tag, and applies the same rules it would for any other self-closing tag. This is the new behavior - but in Safari 5, we treated <script/> and <script /> differently, according to my testing. Extending this quirk to <script/> should be a conscious change.
> This is the new behavior - but in Safari 5, we treated <script/> and <script /> differently, according to my testing. Extending this quirk to <script/> should be a conscious change. I'm all for testing, but I'd like to try the HTML5 tokenization rules for self-closing tags in the hopes that they're sufficient here. Changing them for this quirk is possible, of course, but less desirable.
Sounds fine to me.
(In reply to comment #10) > (In reply to comment #8) > > The tokenizer decides whether or not the script token is a self-closing tag, and applies the same rules it would for any other self-closing tag. > > This is the new behavior - but in Safari 5, we treated <script/> and <script /> differently, according to my testing. Extending this quirk to <script/> should be a conscious change. Ohh I see. Can you describe the difference you are seeing? I tried this: <script src="data:text/javascript,alert('loaded script url')"/>PASS and this: <script src="data:text/javascript,alert('loaded script url')" />PASS In Safari 5, both cases displayed the alert and printed 'PASS'; in WebKit nightly, neither did. That indicates to me that the quirk should be extended to both cases, but maybe I'm missing something.
These are different in Safari 5.0.2: <script/>alert(0)</script> <script />alert(0)</script>
(In reply to comment #14) > These are different in Safari 5.0.2: > <script/>alert(0)</script> > <script />alert(0)</script> Ahh, okay thanks. It'll be good to at least have that documented as a change in behavior.
Created attachment 68484 [details] Patch
Comment on attachment 68484 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=68484&action=review I think Adam might want to do the final round of review, but I do have one comment. > WebCore/html/parser/HTMLTreeBuilder.cpp:2220 > + m_tokenizer->setState(HTMLTokenizer::DataState); > + } I would write this as: ASSERT(m_tokenizer->state() == HTMLTokenizer::DataState || m_usePreHTML5ParserQuirks); m_tokenizer->setState(HTMLTokenizer::DataState); Also, I’d try to keep the comment shorter.
Comment on attachment 68484 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=68484&action=review >> WebCore/html/parser/HTMLTreeBuilder.cpp:2220 >> + } > > I would write this as: > > ASSERT(m_tokenizer->state() == HTMLTokenizer::DataState || m_usePreHTML5ParserQuirks); > m_tokenizer->setState(HTMLTokenizer::DataState); > > Also, I’d try to keep the comment shorter. Yep. > WebCore/html/parser/HTMLTreeBuilder.cpp:2724 > + if (m_usePreHTML5ParserQuirks && token.selfClosing()) > + processFakeEndTag(scriptTag); Beautiful.
Thanks darin and abarth!
Committed r68115: <http://trac.webkit.org/changeset/68115>