Bug 68106

Summary: <table><tr><td><svg><desc><td> parses incorrectly
Product: WebKit Reporter: Ian 'Hixie' Hickson <ian>
Component: DOMAssignee: Adam Barth <abarth>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, eric, simonjam, tonyg
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Attachments:
Description Flags
work in progress
none
Patch
none
Patch
none
Patch eric: review+, webkit-ews: commit-queue-

Description Ian 'Hixie' Hickson 2011-09-14 12:39:04 PDT
<table><tr><td><svg><desc><td> parses incorrectly. The second <td> should close the <svg>, as far as I can tell.
Comment 1 Adam Barth 2011-09-14 12:52:17 PDT
Thanks for the report.  Fixing this bug will require updating the HTMLTreeBuilder state machine to match the updated state machine in the spec.
Comment 2 Adam Barth 2011-12-12 17:26:42 PST
Working on a patch for this bug.
Comment 3 Adam Barth 2011-12-12 21:22:04 PST
Created attachment 118947 [details]
work in progress
Comment 4 James Simonsen 2011-12-13 15:28:08 PST
Comment on attachment 118947 [details]
work in progress

FWIW, I read the spec alongside this change and everything looks good to me. It'll be nice to be caught up with foreign content mode again.
Comment 5 Adam Barth 2011-12-13 15:36:35 PST
> FWIW, I read the spec alongside this change and everything looks good to me. It'll be nice to be caught up with foreign content mode again.

Thanks.  This patch uncovered a bug in the spec, which Hixie has just fixed.  I'm going to try incorporating his fix and then we should have something reviewable.
Comment 6 Adam Barth 2011-12-14 15:14:51 PST
Created attachment 119307 [details]
Patch
Comment 7 Early Warning System Bot 2011-12-14 15:22:58 PST
Comment on attachment 119307 [details]
Patch

Attachment 119307 [details] did not pass qt-ews (qt):
Output: http://queues.webkit.org/results/10875456
Comment 8 Eric Seidel (no email) 2011-12-14 15:44:17 PST
Comment on attachment 119307 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=119307&action=review

> Source/WebCore/html/parser/HTMLElementStack.cpp:293
> +        // FIXME: Technically we shouldn't read back from the DOM here.

You should explain what we're supposed to do instead. :)

> Source/WebCore/html/parser/HTMLTreeBuilder.cpp:469
> +    m_parser->tokenizer()->setShouldAllowCDATA(m_tree.openElements()->stackDepth() > 0 && !isInHTMLNamespace(m_tree.currentNode()));

Do we have an !isEmpty() instead?

> Source/WebCore/html/parser/HTMLTreeBuilder.cpp:2660
> +    if (!m_tree.openElements()->stackDepth())

Again an isEmpty (if available) is probably clearer here.

> Source/WebCore/html/parser/HTMLTreeBuilder.cpp:2664
> +    ContainerNode* node = m_tree.currentNode();
> +    if (!node)
> +        return false;

So this is a hack around the hack of allowing non-nodes onto the element stack?   Or how do we get to this case?

> Source/WebCore/html/parser/HTMLTreeBuilder.cpp:2670
> +            && token.name() != MathMLNames::mglyphTag.localName()
> +            && token.name() != MathMLNames::malignmarkTag.localName())

We don't have a shorter way to write this?  either token.hasTagName(mglyphTag) or mglyphTag.hasLocalName(token.name)?

> Source/WebCore/html/parser/HTMLTreeBuilder.cpp:2701
> +            || token.name() == bigTag

Contrasting this line with my previous comment, this seems cleaner.  There must be an AtomicString == QualifiedName comparison operator?

> Source/WebCore/html/parser/HTMLTreeBuilder.cpp:2765
> +        if (token.name() == SVGNames::scriptTag && m_tree.currentNode()->hasTagName(SVGNames::scriptTag)) {
> +            m_isPaused = true;
> +            m_scriptToProcess = m_tree.currentElement();
> +            m_tree.openElements()->pop();
> +            return;
> +        }

I'm not sure I fully follow this special case for script.

> Source/WebCore/html/parser/HTMLTreeBuilder.cpp:2767
> +            // FIXME: This code just wants an Element* iterator, instead of an ElementRecord*

Are there other places in the code which want this?  Do we have a bug for this request?

> Source/WebCore/html/parser/HTMLTreeBuilder.cpp:2779
> +                if (isInHTMLNamespace(nodeRecord->node()))
> +                    break;

So we pop all open elements unless we bottom out at some HTML, in which case we just process the token regularly?  I'm confused as to why we break here.
Comment 9 James Simonsen 2011-12-14 15:53:13 PST
View in context: https://bugs.webkit.org/attachment.cgi?id=119307&action=review

FYI, I compared this closely with the spec and it still looks good to me.

> Source/WebCore/ChangeLog:16
> +        Tested by the html5lib LayoutTests.  These tests so the progression in

I don't follow this last sentence.

> Source/WebCore/html/parser/HTMLTreeBuilder.cpp:2334
> +        // This check is the negation of what how we would normally write this

This took a while to parse. It might be better just to do the if/else with the awkward fall through. I suspect it'd be easier to read, though possibly more error prone if someone modifies it. You're more familiar with WebKit though, so I'll leave it up to you.
Comment 10 Adam Barth 2011-12-14 15:58:55 PST
> > Source/WebCore/html/parser/HTMLTreeBuilder.cpp:2765
> > +        if (token.name() == SVGNames::scriptTag && m_tree.currentNode()->hasTagName(SVGNames::scriptTag)) {
> > +            m_isPaused = true;
> > +            m_scriptToProcess = m_tree.currentElement();
> > +            m_tree.openElements()->pop();
> > +            return;
> > +        }
> 
> I'm not sure I fully follow this special case for script.
> 
> > Source/WebCore/html/parser/HTMLTreeBuilder.cpp:2767
> > +            // FIXME: This code just wants an Element* iterator, instead of an ElementRecord*
> 
> Are there other places in the code which want this?  Do we have a bug for this request?

This is all code that existed previously in the function that handled the "in foreign content" insertion mode.  It's just moving to this new location.

> > Source/WebCore/html/parser/HTMLTreeBuilder.cpp:2779
> > +                if (isInHTMLNamespace(nodeRecord->node()))
> > +                    break;
> 
> So we pop all open elements unless we bottom out at some HTML, in which case we just process the token regularly?  I'm confused as to why we break here.

This is also code that existed previously.  The idea is that if you have a mis-matched end tag, then we'll close up all the foreign content and then ask the rest of the tree builder state machine (which knows about HTML) to figure out what to do with the errant close tag.
Comment 11 Adam Barth 2011-12-14 16:08:54 PST
(In reply to comment #9)
> View in context: https://bugs.webkit.org/attachment.cgi?id=119307&action=review
> 
> FYI, I compared this closely with the spec and it still looks good to me.

Thanks.

> > Source/WebCore/ChangeLog:16
> > +        Tested by the html5lib LayoutTests.  These tests so the progression in
> 
> I don't follow this last sentence.

That's because it's ungrammatical.  Fixed.

> > Source/WebCore/html/parser/HTMLTreeBuilder.cpp:2334
> > +        // This check is the negation of what how we would normally write this
> 
> This took a while to parse. It might be better just to do the if/else with the awkward fall through. I suspect it'd be easier to read, though possibly more error prone if someone modifies it. You're more familiar with WebKit though, so I'll leave it up to you.

Ok.  Will do.
Comment 12 Adam Barth 2011-12-14 16:10:53 PST
Created attachment 119315 [details]
Patch
Comment 13 Early Warning System Bot 2011-12-14 16:15:25 PST
Comment on attachment 119315 [details]
Patch

Attachment 119315 [details] did not pass qt-ews (qt):
Output: http://queues.webkit.org/results/10903001
Comment 14 James Simonsen 2011-12-14 16:16:17 PST
View in context: https://bugs.webkit.org/attachment.cgi?id=119315&action=review

Yeah, I like that better. It lines up perfectly with the spec. Thanks.

> Source/WebCore/html/parser/HTMLTreeBuilder.cpp:2338
> +            ASSERT(m_tree.currentNode()->isElementNode());

This isn't needed now.

> Source/WebCore/html/parser/HTMLTreeBuilder.cpp:2339
> +            ASSERT(insertionMode() == InTableMode || insertionMode() == InTableBodyMode || insertionMode() == InRowMode);

This belongs at the top of the case before the if.
Comment 15 Adam Barth 2011-12-14 16:21:01 PST
> > Source/WebCore/html/parser/HTMLTreeBuilder.cpp:2338
> > +            ASSERT(m_tree.currentNode()->isElementNode());
> 
> This isn't needed now.

Removed.

> > Source/WebCore/html/parser/HTMLTreeBuilder.cpp:2339
> > +            ASSERT(insertionMode() == InTableMode || insertionMode() == InTableBodyMode || insertionMode() == InRowMode);
> 
> This belongs at the top of the case before the if.

Done.  (I moved the other ASSERT too.)
Comment 16 Adam Barth 2011-12-14 16:23:33 PST
Created attachment 119320 [details]
Patch
Comment 17 Early Warning System Bot 2011-12-14 16:42:13 PST
Comment on attachment 119320 [details]
Patch

Attachment 119320 [details] did not pass qt-ews (qt):
Output: http://queues.webkit.org/results/10905022
Comment 18 Eric Seidel (no email) 2011-12-15 01:18:12 PST
Comment on attachment 119320 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=119320&action=review

LGTM.  Please run the benchmark before landing.  Why is Qt geborked?

> Source/WebCore/html/parser/HTMLTreeBuilder.cpp:2784
> +        String characters = String(token.characters().data(), token.characters().size());

Do we just have a copy() call?  Did you test perf of this patch?
Comment 19 Adam Barth 2011-12-15 02:51:20 PST
> Why is Qt geborked?

It looks like Qt has a problem with its incremental build.  It doesn't seem to be picking up the change to mathattrs.in.
Comment 20 Adam Barth 2011-12-15 03:09:40 PST
I'm not even sure how to run the benchmark now that Safari automatically restarts if it thinks the page is hung.
Comment 21 Eric Seidel (no email) 2011-12-15 10:57:36 PST
I think we should run the benchmark in DRT, but that requires disabling the watchdog.  Which should be an easy patch. :)
Comment 22 Adam Barth 2011-12-15 11:36:57 PST
I'm going to try using a Chrome build since Chrome doesn't have the auto-restart timer.
Comment 23 Adam Barth 2011-12-15 13:00:13 PST
I'm glad I ran the benchmark.  There was a slight slowdown.  Inlining the new processCharacterBufferForInBody fixed it.
Comment 24 Adam Barth 2011-12-15 13:35:30 PST
Committed r102981: <http://trac.webkit.org/changeset/102981>