Bug 42909 - REGRESSION (r61285): <script /> parses differently in HTML5
: REGRESSION (r61285): <script /> parses differently in HTML5
Status: RESOLVED FIXED
: WebKit
WebCore Misc.
: 528+ (Nightly build)
: PC Mac OS X 10.6
: P2 Normal
Assigned To:
:
: InRadar, Regression
:
: 41115 46334
  Show dependency treegraph
 
Reported: 2010-07-23 14:19 PST by
Modified: 2010-09-22 20:00 PST (History)


Attachments
Patch (12.36 KB, patch)
2010-09-22 13:40 PST, Andy Estes
no flags Review Patch | Details | Formatted Diff | Diff
Patch (14.78 KB, patch)
2010-09-22 18:00 PST, Andy Estes
abarth: review+
abarth: commit‑queue-
Review Patch | Details | Formatted Diff | Diff


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2010-07-23 14:19:02 PST
In <rdar://problem/8228000> it was reported to us that an internal Apple site displays as an almost empty page with the HTML5 parser enabled.  This happens because the page contains markup similar to the following:

    <script src="data:text/javascript," type="text/javascript" />
    <p>Hello world</p>

In Safari 5 this displays as “Hello world”.  In tip of tree this displays as a blank page.

I understand that this markup is not well-formed due the missing closing tag on the script element, but this seems to be a change in behavior that could have a serious backwards-compatibility impact on existing WebKit-only content.
------- Comment #1 From 2010-07-23 14:35:25 PST -------
Yep.  This is on our list of things we expect to see compat problems with:

https://docs.google.com/document/edit?id=1as5xYjyMSCph4960iz0-Kb7hZKf_L6f2vts57NMcVBI&hl=en

So far, we haven't seen any examples of this on the public Internet.  As far as I know, WebKit is the only rendering engine with this quirk.
------- Comment #2 From 2010-07-24 17:26:53 PST -------
I'm wondering if this is at all related to bug 42900 (not directly of course, but perhaps historically).
------- Comment #3 From 2010-07-25 06:09:29 PST -------
I'm sure if I understand the connection.  Parsing HTML comments in script elements is somewhat tricky and takes up a bunch of states in the lexer.
------- Comment #4 From 2010-09-22 13:40:25 PST -------
Created an attachment (id=68433) [details]
Patch
------- Comment #5 From 2010-09-22 13:46:17 PST -------
(From update of attachment 68433 [details])
View in context: https://bugs.webkit.org/attachment.cgi?id=68433&action=review

This looks good.  One minor quibble.

> WebCore/html/parser/HTMLTreeBuilder.cpp:2715
> +        if (m_usePreHTML5ParserQuirks && token.selfClosing()) {
> +            processSelfClosingScriptTag(token);
> +            return true;
> +        }
> +        
>          processScriptStartTag(token);

I would have expected the branch on the quirk to go after this line and then just process a fake end script tag.

> WebCore/html/parser/HTMLTreeBuilder.cpp:2760
> +    m_tree.insertScriptElement(token);
> +    m_originalInsertionMode = m_insertionMode;
> +    m_lastScriptElementStartLine = m_tokenizer->lineNumber();
> +    setInsertionMode(TextMode);

Can we share more code with processScriptStartTag ?

> LayoutTests/fast/parser/pre-html5-parser-quirks.html:18
> +<iframe src="resources/pre-html5-parser-quirk-self-closing-script.html"></iframe>

Can you add a test for the fragment mode too?  Also, please add a test for self-closing script tags in the head element b/c that goes through a slightly different code path.
------- Comment #6 From 2010-09-22 14:13:54 PST -------
I think that this needs more tests for what is considered to be self-closing tag. E.g. <script /> is, but <script/> isn't. What about other whitespace?
------- Comment #7 From 2010-09-22 14:47:37 PST -------
(In reply to comment #5)
> (From update of attachment 68433 [details] [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=68433&action=review
> 
> This looks good.  One minor quibble.
> 
> > WebCore/html/parser/HTMLTreeBuilder.cpp:2715
> > +        if (m_usePreHTML5ParserQuirks && token.selfClosing()) {
> > +            processSelfClosingScriptTag(token);
> > +            return true;
> > +        }
> > +        
> >          processScriptStartTag(token);
> 
> I would have expected the branch on the quirk to go after this line and then just process a fake end script tag.
> 
> > WebCore/html/parser/HTMLTreeBuilder.cpp:2760
> > +    m_tree.insertScriptElement(token);
> > +    m_originalInsertionMode = m_insertionMode;
> > +    m_lastScriptElementStartLine = m_tokenizer->lineNumber();
> > +    setInsertionMode(TextMode);
> 
> Can we share more code with processScriptStartTag ?

I didn't call processScriptStartTag() directly since it would leave the tokenizer in the ScriptDataState. I suppose I could set it back to DataState before inserting the fake end tag. Does that seem cleaner than writing a separate routine?

> 
> > LayoutTests/fast/parser/pre-html5-parser-quirks.html:18
> > +<iframe src="resources/pre-html5-parser-quirk-self-closing-script.html"></iframe>
> 
> Can you add a test for the fragment mode too?  Also, please add a test for self-closing script tags in the head element b/c that goes through a slightly different code path.

Will do.
------- Comment #8 From 2010-09-22 14:52:08 PST -------
(In reply to comment #6)
> I think that this needs more tests for what is considered to be self-closing tag. E.g. <script /> is, but <script/> isn't. What about other whitespace?

The tokenizer decides whether or not the script token is a self-closing tag, and applies the same rules it would for any other self-closing tag. My assumption is that self-closing tags in general are covered by pre-existing tokenizer tests. If it isn't, then they should be added, but this doesn't seem like the right place for such a test.
------- Comment #9 From 2010-09-22 15:02:57 PST -------
(From update of attachment 68433 [details])
View in context: https://bugs.webkit.org/attachment.cgi?id=68433&action=review

>>> WebCore/html/parser/HTMLTreeBuilder.cpp:2715
>>>          processScriptStartTag(token);
>> 
>> I would have expected the branch on the quirk to go after this line and then just process a fake end script tag.
> 
> I didn't call processScriptStartTag() directly since it would leave the tokenizer in the ScriptDataState. I suppose I could set it back to DataState before inserting the fake end tag. Does that seem cleaner than writing a separate routine?

I see.  The issue is that this code:

http://trac.webkit.org/browser/trunk/WebCore/html/parser/HTMLTreeBuilder.cpp#L2197

doesn't set the tokenizer to the DataState because the tokenizer sets itself to the DataState.  What I'd do is make that code explicitly set the tokenizer back to the data state and add an assert that we're only not already in the data state when m_usePreHTML5ParserQuirks is true.  If we do that, then the rest of the machinery should do the right thing without much effort on our part.
------- Comment #10 From 2010-09-22 15:56:11 PST -------
(In reply to comment #8)
> The tokenizer decides whether or not the script token is a self-closing tag, and applies the same rules it would for any other self-closing tag.

This is the new behavior - but in Safari 5, we treated <script/> and <script /> differently, according to my testing. Extending this quirk to <script/> should be a conscious change.
------- Comment #11 From 2010-09-22 15:59:35 PST -------
> This is the new behavior - but in Safari 5, we treated <script/> and <script /> differently, according to my testing. Extending this quirk to <script/> should be a conscious change.

I'm all for testing, but I'd like to try the HTML5 tokenization rules for self-closing tags in the hopes that they're sufficient here.  Changing them for this quirk is possible, of course, but less desirable.
------- Comment #12 From 2010-09-22 16:17:05 PST -------
Sounds fine to me.
------- Comment #13 From 2010-09-22 16:19:37 PST -------
(In reply to comment #10)
> (In reply to comment #8)
> > The tokenizer decides whether or not the script token is a self-closing tag, and applies the same rules it would for any other self-closing tag.
> 
> This is the new behavior - but in Safari 5, we treated <script/> and <script /> differently, according to my testing. Extending this quirk to <script/> should be a conscious change.

Ohh I see. Can you describe the difference you are seeing?

I tried this:

<script src="data:text/javascript,alert('loaded script url')"/>PASS

and this:

<script src="data:text/javascript,alert('loaded script url')" />PASS

In Safari 5, both cases displayed the alert and printed 'PASS'; in WebKit nightly, neither did. That indicates to me that the quirk should be extended to both cases, but maybe I'm missing something.
------- Comment #14 From 2010-09-22 16:23:15 PST -------
These are different in Safari 5.0.2:
<script/>alert(0)</script>
<script />alert(0)</script>
------- Comment #15 From 2010-09-22 16:26:54 PST -------
(In reply to comment #14)
> These are different in Safari 5.0.2:
> <script/>alert(0)</script>
> <script />alert(0)</script>

Ahh, okay thanks. It'll be good to at least have that documented as a change in behavior.
------- Comment #16 From 2010-09-22 18:00:13 PST -------
Created an attachment (id=68484) [details]
Patch
------- Comment #17 From 2010-09-22 18:29:41 PST -------
(From update of attachment 68484 [details])
View in context: https://bugs.webkit.org/attachment.cgi?id=68484&action=review

I think Adam might want to do the final round of review, but I do have one comment.

> WebCore/html/parser/HTMLTreeBuilder.cpp:2220
> +                m_tokenizer->setState(HTMLTokenizer::DataState);
> +            }

I would write this as:

    ASSERT(m_tokenizer->state() == HTMLTokenizer::DataState || m_usePreHTML5ParserQuirks);
    m_tokenizer->setState(HTMLTokenizer::DataState);

Also, I’d try to keep the comment shorter.
------- Comment #18 From 2010-09-22 19:11:34 PST -------
(From update of attachment 68484 [details])
View in context: https://bugs.webkit.org/attachment.cgi?id=68484&action=review

>> WebCore/html/parser/HTMLTreeBuilder.cpp:2220
>> +            }
> 
> I would write this as:
> 
>     ASSERT(m_tokenizer->state() == HTMLTokenizer::DataState || m_usePreHTML5ParserQuirks);
>     m_tokenizer->setState(HTMLTokenizer::DataState);
> 
> Also, I’d try to keep the comment shorter.

Yep.

> WebCore/html/parser/HTMLTreeBuilder.cpp:2724
> +        if (m_usePreHTML5ParserQuirks && token.selfClosing())
> +            processFakeEndTag(scriptTag);

Beautiful.
------- Comment #19 From 2010-09-22 19:16:27 PST -------
Thanks darin and abarth!
------- Comment #20 From 2010-09-22 19:39:40 PST -------
Committed r68115: <http://trac.webkit.org/changeset/68115>