Bug 42909 - REGRESSION (r61285): <script /> parses differently in HTML5
Summary: REGRESSION (r61285): <script /> parses differently in HTML5
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC OS X 10.6
: P2 Normal
Assignee: Andy Estes
URL:
Keywords: InRadar, Regression
Depends on:
Blocks: 41115 46334
  Show dependency treegraph
 
Reported: 2010-07-23 14:19 PDT by Mark Rowe (bdash)
Modified: 2010-09-22 20:00 PDT (History)
7 users (show)

See Also:


Attachments
Patch (12.36 KB, patch)
2010-09-22 13:40 PDT, Andy Estes
no flags Details | Formatted Diff | Diff
Patch (14.78 KB, patch)
2010-09-22 18:00 PDT, Andy Estes
abarth: review+
abarth: commit-queue-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Rowe (bdash) 2010-07-23 14:19:02 PDT
In <rdar://problem/8228000> it was reported to us that an internal Apple site displays as an almost empty page with the HTML5 parser enabled.  This happens because the page contains markup similar to the following:

    <script src="data:text/javascript," type="text/javascript" />
    <p>Hello world</p>

In Safari 5 this displays as “Hello world”.  In tip of tree this displays as a blank page.

I understand that this markup is not well-formed due the missing closing tag on the script element, but this seems to be a change in behavior that could have a serious backwards-compatibility impact on existing WebKit-only content.
Comment 1 Adam Barth 2010-07-23 14:35:25 PDT
Yep.  This is on our list of things we expect to see compat problems with:

https://docs.google.com/document/edit?id=1as5xYjyMSCph4960iz0-Kb7hZKf_L6f2vts57NMcVBI&hl=en

So far, we haven't seen any examples of this on the public Internet.  As far as I know, WebKit is the only rendering engine with this quirk.
Comment 2 Alexey Proskuryakov 2010-07-24 17:26:53 PDT
I'm wondering if this is at all related to bug 42900 (not directly of course, but perhaps historically).
Comment 3 Adam Barth 2010-07-25 06:09:29 PDT
I'm sure if I understand the connection.  Parsing HTML comments in script elements is somewhat tricky and takes up a bunch of states in the lexer.
Comment 4 Andy Estes 2010-09-22 13:40:25 PDT
Created attachment 68433 [details]
Patch
Comment 5 Adam Barth 2010-09-22 13:46:17 PDT
Comment on attachment 68433 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=68433&action=review

This looks good.  One minor quibble.

> WebCore/html/parser/HTMLTreeBuilder.cpp:2715
> +        if (m_usePreHTML5ParserQuirks && token.selfClosing()) {
> +            processSelfClosingScriptTag(token);
> +            return true;
> +        }
> +        
>          processScriptStartTag(token);

I would have expected the branch on the quirk to go after this line and then just process a fake end script tag.

> WebCore/html/parser/HTMLTreeBuilder.cpp:2760
> +    m_tree.insertScriptElement(token);
> +    m_originalInsertionMode = m_insertionMode;
> +    m_lastScriptElementStartLine = m_tokenizer->lineNumber();
> +    setInsertionMode(TextMode);

Can we share more code with processScriptStartTag ?

> LayoutTests/fast/parser/pre-html5-parser-quirks.html:18
> +<iframe src="resources/pre-html5-parser-quirk-self-closing-script.html"></iframe>

Can you add a test for the fragment mode too?  Also, please add a test for self-closing script tags in the head element b/c that goes through a slightly different code path.
Comment 6 Alexey Proskuryakov 2010-09-22 14:13:54 PDT
I think that this needs more tests for what is considered to be self-closing tag. E.g. <script /> is, but <script/> isn't. What about other whitespace?
Comment 7 Andy Estes 2010-09-22 14:47:37 PDT
(In reply to comment #5)
> (From update of attachment 68433 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=68433&action=review
> 
> This looks good.  One minor quibble.
> 
> > WebCore/html/parser/HTMLTreeBuilder.cpp:2715
> > +        if (m_usePreHTML5ParserQuirks && token.selfClosing()) {
> > +            processSelfClosingScriptTag(token);
> > +            return true;
> > +        }
> > +        
> >          processScriptStartTag(token);
> 
> I would have expected the branch on the quirk to go after this line and then just process a fake end script tag.
> 
> > WebCore/html/parser/HTMLTreeBuilder.cpp:2760
> > +    m_tree.insertScriptElement(token);
> > +    m_originalInsertionMode = m_insertionMode;
> > +    m_lastScriptElementStartLine = m_tokenizer->lineNumber();
> > +    setInsertionMode(TextMode);
> 
> Can we share more code with processScriptStartTag ?

I didn't call processScriptStartTag() directly since it would leave the tokenizer in the ScriptDataState. I suppose I could set it back to DataState before inserting the fake end tag. Does that seem cleaner than writing a separate routine?

> 
> > LayoutTests/fast/parser/pre-html5-parser-quirks.html:18
> > +<iframe src="resources/pre-html5-parser-quirk-self-closing-script.html"></iframe>
> 
> Can you add a test for the fragment mode too?  Also, please add a test for self-closing script tags in the head element b/c that goes through a slightly different code path.

Will do.
Comment 8 Andy Estes 2010-09-22 14:52:08 PDT
(In reply to comment #6)
> I think that this needs more tests for what is considered to be self-closing tag. E.g. <script /> is, but <script/> isn't. What about other whitespace?

The tokenizer decides whether or not the script token is a self-closing tag, and applies the same rules it would for any other self-closing tag. My assumption is that self-closing tags in general are covered by pre-existing tokenizer tests. If it isn't, then they should be added, but this doesn't seem like the right place for such a test.
Comment 9 Adam Barth 2010-09-22 15:02:57 PDT
Comment on attachment 68433 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=68433&action=review

>>> WebCore/html/parser/HTMLTreeBuilder.cpp:2715
>>>          processScriptStartTag(token);
>> 
>> I would have expected the branch on the quirk to go after this line and then just process a fake end script tag.
> 
> I didn't call processScriptStartTag() directly since it would leave the tokenizer in the ScriptDataState. I suppose I could set it back to DataState before inserting the fake end tag. Does that seem cleaner than writing a separate routine?

I see.  The issue is that this code:

http://trac.webkit.org/browser/trunk/WebCore/html/parser/HTMLTreeBuilder.cpp#L2197

doesn't set the tokenizer to the DataState because the tokenizer sets itself to the DataState.  What I'd do is make that code explicitly set the tokenizer back to the data state and add an assert that we're only not already in the data state when m_usePreHTML5ParserQuirks is true.  If we do that, then the rest of the machinery should do the right thing without much effort on our part.
Comment 10 Alexey Proskuryakov 2010-09-22 15:56:11 PDT
(In reply to comment #8)
> The tokenizer decides whether or not the script token is a self-closing tag, and applies the same rules it would for any other self-closing tag.

This is the new behavior - but in Safari 5, we treated <script/> and <script /> differently, according to my testing. Extending this quirk to <script/> should be a conscious change.
Comment 11 Adam Barth 2010-09-22 15:59:35 PDT
> This is the new behavior - but in Safari 5, we treated <script/> and <script /> differently, according to my testing. Extending this quirk to <script/> should be a conscious change.

I'm all for testing, but I'd like to try the HTML5 tokenization rules for self-closing tags in the hopes that they're sufficient here.  Changing them for this quirk is possible, of course, but less desirable.
Comment 12 Alexey Proskuryakov 2010-09-22 16:17:05 PDT
Sounds fine to me.
Comment 13 Andy Estes 2010-09-22 16:19:37 PDT
(In reply to comment #10)
> (In reply to comment #8)
> > The tokenizer decides whether or not the script token is a self-closing tag, and applies the same rules it would for any other self-closing tag.
> 
> This is the new behavior - but in Safari 5, we treated <script/> and <script /> differently, according to my testing. Extending this quirk to <script/> should be a conscious change.

Ohh I see. Can you describe the difference you are seeing?

I tried this:

<script src="data:text/javascript,alert('loaded script url')"/>PASS

and this:

<script src="data:text/javascript,alert('loaded script url')" />PASS

In Safari 5, both cases displayed the alert and printed 'PASS'; in WebKit nightly, neither did. That indicates to me that the quirk should be extended to both cases, but maybe I'm missing something.
Comment 14 Alexey Proskuryakov 2010-09-22 16:23:15 PDT
These are different in Safari 5.0.2:
<script/>alert(0)</script>
<script />alert(0)</script>
Comment 15 Andy Estes 2010-09-22 16:26:54 PDT
(In reply to comment #14)
> These are different in Safari 5.0.2:
> <script/>alert(0)</script>
> <script />alert(0)</script>

Ahh, okay thanks. It'll be good to at least have that documented as a change in behavior.
Comment 16 Andy Estes 2010-09-22 18:00:13 PDT
Created attachment 68484 [details]
Patch
Comment 17 Darin Adler 2010-09-22 18:29:41 PDT
Comment on attachment 68484 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=68484&action=review

I think Adam might want to do the final round of review, but I do have one comment.

> WebCore/html/parser/HTMLTreeBuilder.cpp:2220
> +                m_tokenizer->setState(HTMLTokenizer::DataState);
> +            }

I would write this as:

    ASSERT(m_tokenizer->state() == HTMLTokenizer::DataState || m_usePreHTML5ParserQuirks);
    m_tokenizer->setState(HTMLTokenizer::DataState);

Also, I’d try to keep the comment shorter.
Comment 18 Adam Barth 2010-09-22 19:11:34 PDT
Comment on attachment 68484 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=68484&action=review

>> WebCore/html/parser/HTMLTreeBuilder.cpp:2220
>> +            }
> 
> I would write this as:
> 
>     ASSERT(m_tokenizer->state() == HTMLTokenizer::DataState || m_usePreHTML5ParserQuirks);
>     m_tokenizer->setState(HTMLTokenizer::DataState);
> 
> Also, I’d try to keep the comment shorter.

Yep.

> WebCore/html/parser/HTMLTreeBuilder.cpp:2724
> +        if (m_usePreHTML5ParserQuirks && token.selfClosing())
> +            processFakeEndTag(scriptTag);

Beautiful.
Comment 19 Andy Estes 2010-09-22 19:16:27 PDT
Thanks darin and abarth!
Comment 20 Andy Estes 2010-09-22 19:39:40 PDT
Committed r68115: <http://trac.webkit.org/changeset/68115>