Bug 109495

Summary: Fix svg/in-html/script-write.html with threaded HTML parser
Product: WebKit Reporter: Tony Gentilcore <tonyg>
Component: New BugsAssignee: Tony Gentilcore <tonyg>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, eric, ojan.autocc, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on:    
Bug Blocks: 106127    
Attachments:
Description Flags
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch
none
Patch for landing none

Description Tony Gentilcore 2013-02-11 14:53:25 PST
Fix svg/in-html/script-write.html with threaded HTML parser
Comment 1 Tony Gentilcore 2013-02-11 14:57:39 PST
Created attachment 187688 [details]
Patch
Comment 2 Adam Barth 2013-02-11 15:03:12 PST
Comment on attachment 187688 [details]
Patch

This is probably related to different ways of parsing scripts in foreign content.  It's likely to be a bug with the threaded parser.
Comment 3 Tony Gentilcore 2013-02-11 18:18:35 PST
Created attachment 187742 [details]
Patch
Comment 4 Tony Gentilcore 2013-02-11 18:19:23 PST
(In reply to comment #2)
> (From update of attachment 187688 [details])
> This is probably related to different ways of parsing scripts in foreign content.  It's likely to be a bug with the threaded parser.

Good catch, there was a behavior difference here.
Comment 5 Tony Gentilcore 2013-02-11 18:27:28 PST
Created attachment 187745 [details]
Patch
Comment 6 Eric Seidel (no email) 2013-02-11 18:38:36 PST
Comment on attachment 187745 [details]
Patch

This seems reasonable.  All of those tags are HTML tags we would be handling.  Are we sure the m_inForeignContent is correctly "false" for html in SVG?  I'm not even sure what that's supposed to do... if say we were to encounter a <html><svg><foreignObject><plaintext>
Comment 7 Tony Gentilcore 2013-02-11 18:44:40 PST
> Are we sure the m_inForeignContent is correctly "false" for html in SVG?  I'm not even sure what that's supposed to do... if say we were to encounter a <html><svg><foreignObject><plaintext>

It seems worth adding a test case for that separately. But this patch just causes us to match the current behavior.
Comment 8 Eric Seidel (no email) 2013-02-11 19:14:56 PST
Comment on attachment 187745 [details]
Patch

I don't wish to derail your bug, but I think we should at least test this case with your patch:

<!DOCTYPE html>
<html>
    <body>
        <svg id="svg1" width="200" height="100" xmlns="http://www.w3.org/2000/svg">
            <foreignObject x="0" y="0" width="200" height="100">
                <div>foo</div>
                <plaintext>
            </foreignObject>
        </svg>
    	<div>bar</div>
	</body>
</html>

And then either file a bug if we're changing behavior, or commit the test along with your change.  The current behavior is kinda odd. :)
Comment 9 Eric Seidel (no email) 2013-02-11 19:15:11 PST
I should note that our current behavior matches FF. :)
Comment 10 Tony Gentilcore 2013-02-13 11:49:48 PST
Created attachment 188137 [details]
Patch
Comment 11 Tony Gentilcore 2013-02-13 11:52:14 PST
Your test case was a good one. Matching that current behavior exposed a failure in some other svg tests. So we had to improve simulateTreeBuilder a little more. The ChangeLog explains. We pass all fast/parser and svg tests with this patch now.
Comment 12 Adam Barth 2013-02-13 11:53:11 PST
Comment on attachment 188137 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=188137&action=review

> LayoutTests/fast/parser/foreignobject-in-foreigncontent.html:3
> +    <body>

Can we use dump-as-markup.js for this test?
Comment 13 Adam Barth 2013-02-13 11:53:51 PST
Comment on attachment 188137 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=188137&action=review

> Source/WebCore/html/parser/BackgroundHTMLParser.cpp:149
> +                || threadSafeMatch(tagName, bTag)
> +                || threadSafeMatch(tagName, bigTag)
> +                || threadSafeMatch(tagName, blockquoteTag)
> +                || threadSafeMatch(tagName, bodyTag)
> +                || threadSafeMatch(tagName, brTag)
> +                || threadSafeMatch(tagName, centerTag)
> +                || threadSafeMatch(tagName, codeTag)
> +                || threadSafeMatch(tagName, ddTag)
> +                || threadSafeMatch(tagName, divTag)
> +                || threadSafeMatch(tagName, dlTag)
> +                || threadSafeMatch(tagName, dtTag)
> +                || threadSafeMatch(tagName, emTag)
> +                || threadSafeMatch(tagName, embedTag)
> +                || threadSafeMatch(tagName, h1Tag)
> +                || threadSafeMatch(tagName, h2Tag)
> +                || threadSafeMatch(tagName, h3Tag)
> +                || threadSafeMatch(tagName, h4Tag)
> +                || threadSafeMatch(tagName, h5Tag)
> +                || threadSafeMatch(tagName, h6Tag)
> +                || threadSafeMatch(tagName, headTag)
> +                || threadSafeMatch(tagName, hrTag)
> +                || threadSafeMatch(tagName, iTag)
> +                || threadSafeMatch(tagName, imgTag)
> +                || threadSafeMatch(tagName, liTag)
> +                || threadSafeMatch(tagName, listingTag)
> +                || threadSafeMatch(tagName, menuTag)

Can we put all these into an inline helper function?
Comment 14 Eric Seidel (no email) 2013-02-13 11:55:58 PST
Comment on attachment 188137 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=188137&action=review

> Source/WebCore/html/parser/BackgroundHTMLParser.cpp:123
> +        if (m_inForeignContent
> +            && (equalIgnoringCase(tagName, SVGNames::foreignObjectTag.localName())

So a <foreignObject> tag inside mathml?  I think we need to know what namespace we're in to do this right. :(
Comment 15 Tony Gentilcore 2013-02-13 12:34:35 PST
Created attachment 188148 [details]
Patch
Comment 16 Tony Gentilcore 2013-02-13 12:35:56 PST
> Can we use dump-as-markup.js for this test?

Done

> Can we put all these into an inline helper function?

Done

> So a <foreignObject> tag inside mathml?  I think we need to know what namespace we're in to do this right. :(

I split m_inForeignContent into m_inSVG and m_inMathML.
Comment 17 Adam Barth 2013-02-13 12:38:04 PST
Comment on attachment 188148 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=188148&action=review

> Source/WebCore/html/parser/BackgroundHTMLParser.cpp:173
> +            m_inMathML = true;

Can we be both m_inSVG and m_inMathML?  If not, we might want to use an enum that has only the possible states.
Comment 18 Eric Seidel (no email) 2013-02-13 13:30:23 PST
Comment on attachment 188148 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=188148&action=review

> Source/WebCore/html/parser/BackgroundHTMLParser.cpp:177
> +        if (inForeignContent() && tokenExitsForeignContent(token)) {
> +            m_inSVG = false;
> +            m_inMathML = false;
> +        }

I'm concerned that this won't handle:

<html><svg><foreignObject></foreignObject><title></svg>This text should appear correctly since we'll think we're in HTML as soon as we leave the fo?
Comment 19 Tony Gentilcore 2013-02-13 15:34:45 PST
Created attachment 188197 [details]
Patch
Comment 20 WebKit Review Bot 2013-02-13 15:37:07 PST
Attachment 188197 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'LayoutTests/ChangeLog', u'LayoutTests/fast/parser/foreignobject-in-foreigncontent-expected.txt', u'LayoutTests/fast/parser/foreignobject-in-foreigncontent.html', u'LayoutTests/fast/parser/ignore-title-in-svg-after-foreignobject-expected.txt', u'LayoutTests/fast/parser/ignore-title-in-svg-after-foreignobject.html', u'LayoutTests/fast/parser/unmatched-close-foreignobject-expected.txt', u'LayoutTests/fast/parser/unmatched-close-foreignobject.html', u'Source/WebCore/ChangeLog', u'Source/WebCore/html/parser/BackgroundHTMLParser.cpp', u'Source/WebCore/html/parser/BackgroundHTMLParser.h', u'Source/WebCore/html/parser/CompactHTMLToken.cpp', u'Source/WebCore/html/parser/CompactHTMLToken.h']" exit_code: 1
Source/WebCore/html/parser/BackgroundHTMLParser.h:74:  enum members should use InterCaps with an initial capital letter.  [readability/enum_casing] [4]
Source/WebCore/html/parser/BackgroundHTMLParser.h:75:  enum members should use InterCaps with an initial capital letter.  [readability/enum_casing] [4]
Total errors found: 2 in 12 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 21 Tony Gentilcore 2013-02-13 15:37:16 PST
> Can we be both m_inSVG and m_inMathML?  If not, we might want to use an enum that has only the possible states.

Done

> > Source/WebCore/html/parser/BackgroundHTMLParser.cpp:177
> > +        if (inForeignContent() && tokenExitsForeignContent(token)) {
> > +            m_inSVG = false;
> > +            m_inMathML = false;
> > +        }
> 
> I'm concerned that this won't handle:
> 
> <html><svg><foreignObject></foreignObject><title></svg>This text should appear correctly since we'll think we're in HTML as soon as we leave the fo?

You were right all along: this requires a stack. I added that and your test case. Our behavior matches the current parser on all the new tests. Please let me know if you can think of other interesting edge cases we should test.
Comment 22 Tony Gentilcore 2013-02-13 15:37:57 PST
(In reply to comment #20)
> Attachment 188197 [details] did not pass style-queue:
> 
> Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'LayoutTests/ChangeLog', u'LayoutTests/fast/parser/foreignobject-in-foreigncontent-expected.txt', u'LayoutTests/fast/parser/foreignobject-in-foreigncontent.html', u'LayoutTests/fast/parser/ignore-title-in-svg-after-foreignobject-expected.txt', u'LayoutTests/fast/parser/ignore-title-in-svg-after-foreignobject.html', u'LayoutTests/fast/parser/unmatched-close-foreignobject-expected.txt', u'LayoutTests/fast/parser/unmatched-close-foreignobject.html', u'Source/WebCore/ChangeLog', u'Source/WebCore/html/parser/BackgroundHTMLParser.cpp', u'Source/WebCore/html/parser/BackgroundHTMLParser.h', u'Source/WebCore/html/parser/CompactHTMLToken.cpp', u'Source/WebCore/html/parser/CompactHTMLToken.h']" exit_code: 1
> Source/WebCore/html/parser/BackgroundHTMLParser.h:74:  enum members should use InterCaps with an initial capital letter.  [readability/enum_casing] [4]
> Source/WebCore/html/parser/BackgroundHTMLParser.h:75:  enum members should use InterCaps with an initial capital letter.  [readability/enum_casing] [4]
> Total errors found: 2 in 12 files
> 
> 
> If any of these errors are false positives, please file a bug against check-webkit-style.

I think it is better to have HTML and SVG enum values than Html and Svg. Please let me know if otherwise.
Comment 23 Eric Seidel (no email) 2013-02-13 15:44:40 PST
Comment on attachment 188197 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=188197&action=review

> Source/WebCore/html/parser/BackgroundHTMLParser.cpp:62
> +    // FIXME: This is copied from HTMLTreeBuilder::processTokenInForeignContent and changed to use threadSafeMatch.
> +    const String& tagName = token.data();
> +    return threadSafeMatch(tagName, bTag)

Our technical debt for not solving the AtomicString problem is mounting... :p

> Source/WebCore/html/parser/BackgroundHTMLParser.h:88
> +    Vector<Namespace> m_namespaceStack;

Do we want any initial capacity?  Do we worry about keeping this up to date with document.writes which are processed on the main thread?
Comment 24 Tony Gentilcore 2013-02-13 15:54:00 PST
Created attachment 188206 [details]
Patch
Comment 25 Tony Gentilcore 2013-02-13 15:55:40 PST
> Do we want any initial capacity?

I think it is safe to say SVG and MathML are rare enough that we should use 1 as the initial capacity (to account for the HTML namespace added in the ctor).

> Do we worry about keeping this up to date with document.writes which are processed on the main thread?

That's a tough one. Do you have any ideas how we can handle that?
Comment 26 WebKit Review Bot 2013-02-13 15:58:08 PST
Attachment 188206 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'LayoutTests/ChangeLog', u'LayoutTests/fast/parser/foreignobject-in-foreigncontent-expected.txt', u'LayoutTests/fast/parser/foreignobject-in-foreigncontent.html', u'LayoutTests/fast/parser/ignore-title-in-svg-after-foreignobject-expected.txt', u'LayoutTests/fast/parser/ignore-title-in-svg-after-foreignobject.html', u'LayoutTests/fast/parser/unmatched-close-foreignobject-expected.txt', u'LayoutTests/fast/parser/unmatched-close-foreignobject.html', u'Source/WebCore/ChangeLog', u'Source/WebCore/html/parser/BackgroundHTMLParser.cpp', u'Source/WebCore/html/parser/BackgroundHTMLParser.h', u'Source/WebCore/html/parser/CompactHTMLToken.cpp', u'Source/WebCore/html/parser/CompactHTMLToken.h']" exit_code: 1
Source/WebCore/html/parser/BackgroundHTMLParser.h:74:  enum members should use InterCaps with an initial capital letter.  [readability/enum_casing] [4]
Source/WebCore/html/parser/BackgroundHTMLParser.h:75:  enum members should use InterCaps with an initial capital letter.  [readability/enum_casing] [4]
Total errors found: 2 in 12 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 27 Adam Barth 2013-02-13 16:07:27 PST
> That's a tough one. Do you have any ideas how we can handle that?

Yeah, we'll need to read the information from the stack of open elements and send it to the background HTML parser in the Checkpoint when we call "resumeFrom".  Don't worry about it for this patch, but please file a bug so that we remember to write tests and fix it.  :)
Comment 28 Tony Gentilcore 2013-02-13 16:14:08 PST
(In reply to comment #27)
> > That's a tough one. Do you have any ideas how we can handle that?
> 
> Yeah, we'll need to read the information from the stack of open elements and send it to the background HTML parser in the Checkpoint when we call "resumeFrom".  Don't worry about it for this patch, but please file a bug so that we remember to write tests and fix it.  :)

Makes sense. Filed bug 109764.
Comment 29 Eric Seidel (no email) 2013-02-13 16:18:32 PST
Comment on attachment 188206 [details]
Patch

I think this is another big step forward, and we should land and iterate.  I suspect that most of the tests you added could have been written as html5lib tests instead.  If you want to explore that before landing, be my guest. :)
Comment 30 Eric Seidel (no email) 2013-02-13 16:19:49 PST
Comment on attachment 188206 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=188206&action=review

> Source/WebCore/html/parser/BackgroundHTMLParser.cpp:62
> +    return threadSafeMatch(tagName, bTag)

Actually.  This is gonna be epiclly slow.  I'm not sure we can do this?

Right?  This is going to run on every start/end token and do slow string compares?  I guess it will just compare the hashes...
Comment 31 Tony Gentilcore 2013-02-13 16:24:40 PST
> I think this is another big step forward, and we should land and iterate.  I suspect that most of the tests you added could have been written as html5lib tests instead.  If you want to explore that before landing, be my guest. :)

I'll do that before landing.

> > Source/WebCore/html/parser/BackgroundHTMLParser.cpp:62
> > +    return threadSafeMatch(tagName, bTag)
> 
> Actually.  This is gonna be epiclly slow.  I'm not sure we can do this?
> 
> Right?  This is going to run on every start/end token and do slow string compares?  I guess it will just compare the hashes...

The only reason I'm okay with it is that we only do it when we're in foreign content mode. If not in foreign content mode, we shortcut.
Comment 32 Tony Gentilcore 2013-02-13 16:48:46 PST
Created attachment 188224 [details]
Patch for landing
Comment 33 Adam Barth 2013-02-13 16:57:43 PST
(In reply to comment #30)
> (From update of attachment 188206 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=188206&action=review
> 
> > Source/WebCore/html/parser/BackgroundHTMLParser.cpp:62
> > +    return threadSafeMatch(tagName, bTag)
> 
> Actually.  This is gonna be epiclly slow.  I'm not sure we can do this?
> 
> Right?  This is going to run on every start/end token and do slow string compares?  I guess it will just compare the hashes...

Yeah, it just compares the hashes.  It might be worth trying the patch in bug 107337 again to see if it's a win now that we're doing more comparisons.  This only runs in foreign content mode, so we might need to make a new svg-in-html parsing benchmark to see the effect.
Comment 34 WebKit Review Bot 2013-02-13 17:27:53 PST
Comment on attachment 188224 [details]
Patch for landing

Clearing flags on attachment: 188224

Committed r142829: <http://trac.webkit.org/changeset/142829>
Comment 35 WebKit Review Bot 2013-02-13 17:27:59 PST
All reviewed patches have been landed.  Closing bug.