RESOLVED FIXED 68106
<table><tr><td><svg><desc><td> parses incorrectly
https://bugs.webkit.org/show_bug.cgi?id=68106
Summary <table><tr><td><svg><desc><td> parses incorrectly
Ian 'Hixie' Hickson
Reported 2011-09-14 12:39:04 PDT
<table><tr><td><svg><desc><td> parses incorrectly. The second <td> should close the <svg>, as far as I can tell.
Attachments
work in progress (33.90 KB, patch)
2011-12-12 21:22 PST, Adam Barth
no flags
Patch (62.50 KB, patch)
2011-12-14 15:14 PST, Adam Barth
no flags
Patch (63.42 KB, patch)
2011-12-14 16:10 PST, Adam Barth
no flags
Patch (63.20 KB, patch)
2011-12-14 16:23 PST, Adam Barth
eric: review+
webkit-ews: commit-queue-
Adam Barth
Comment 1 2011-09-14 12:52:17 PDT
Thanks for the report. Fixing this bug will require updating the HTMLTreeBuilder state machine to match the updated state machine in the spec.
Adam Barth
Comment 2 2011-12-12 17:26:42 PST
Working on a patch for this bug.
Adam Barth
Comment 3 2011-12-12 21:22:04 PST
Created attachment 118947 [details] work in progress
James Simonsen
Comment 4 2011-12-13 15:28:08 PST
Comment on attachment 118947 [details] work in progress FWIW, I read the spec alongside this change and everything looks good to me. It'll be nice to be caught up with foreign content mode again.
Adam Barth
Comment 5 2011-12-13 15:36:35 PST
> FWIW, I read the spec alongside this change and everything looks good to me. It'll be nice to be caught up with foreign content mode again. Thanks. This patch uncovered a bug in the spec, which Hixie has just fixed. I'm going to try incorporating his fix and then we should have something reviewable.
Adam Barth
Comment 6 2011-12-14 15:14:51 PST
Early Warning System Bot
Comment 7 2011-12-14 15:22:58 PST
Eric Seidel (no email)
Comment 8 2011-12-14 15:44:17 PST
Comment on attachment 119307 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=119307&action=review > Source/WebCore/html/parser/HTMLElementStack.cpp:293 > + // FIXME: Technically we shouldn't read back from the DOM here. You should explain what we're supposed to do instead. :) > Source/WebCore/html/parser/HTMLTreeBuilder.cpp:469 > + m_parser->tokenizer()->setShouldAllowCDATA(m_tree.openElements()->stackDepth() > 0 && !isInHTMLNamespace(m_tree.currentNode())); Do we have an !isEmpty() instead? > Source/WebCore/html/parser/HTMLTreeBuilder.cpp:2660 > + if (!m_tree.openElements()->stackDepth()) Again an isEmpty (if available) is probably clearer here. > Source/WebCore/html/parser/HTMLTreeBuilder.cpp:2664 > + ContainerNode* node = m_tree.currentNode(); > + if (!node) > + return false; So this is a hack around the hack of allowing non-nodes onto the element stack? Or how do we get to this case? > Source/WebCore/html/parser/HTMLTreeBuilder.cpp:2670 > + && token.name() != MathMLNames::mglyphTag.localName() > + && token.name() != MathMLNames::malignmarkTag.localName()) We don't have a shorter way to write this? either token.hasTagName(mglyphTag) or mglyphTag.hasLocalName(token.name)? > Source/WebCore/html/parser/HTMLTreeBuilder.cpp:2701 > + || token.name() == bigTag Contrasting this line with my previous comment, this seems cleaner. There must be an AtomicString == QualifiedName comparison operator? > Source/WebCore/html/parser/HTMLTreeBuilder.cpp:2765 > + if (token.name() == SVGNames::scriptTag && m_tree.currentNode()->hasTagName(SVGNames::scriptTag)) { > + m_isPaused = true; > + m_scriptToProcess = m_tree.currentElement(); > + m_tree.openElements()->pop(); > + return; > + } I'm not sure I fully follow this special case for script. > Source/WebCore/html/parser/HTMLTreeBuilder.cpp:2767 > + // FIXME: This code just wants an Element* iterator, instead of an ElementRecord* Are there other places in the code which want this? Do we have a bug for this request? > Source/WebCore/html/parser/HTMLTreeBuilder.cpp:2779 > + if (isInHTMLNamespace(nodeRecord->node())) > + break; So we pop all open elements unless we bottom out at some HTML, in which case we just process the token regularly? I'm confused as to why we break here.
James Simonsen
Comment 9 2011-12-14 15:53:13 PST
View in context: https://bugs.webkit.org/attachment.cgi?id=119307&action=review FYI, I compared this closely with the spec and it still looks good to me. > Source/WebCore/ChangeLog:16 > + Tested by the html5lib LayoutTests. These tests so the progression in I don't follow this last sentence. > Source/WebCore/html/parser/HTMLTreeBuilder.cpp:2334 > + // This check is the negation of what how we would normally write this This took a while to parse. It might be better just to do the if/else with the awkward fall through. I suspect it'd be easier to read, though possibly more error prone if someone modifies it. You're more familiar with WebKit though, so I'll leave it up to you.
Adam Barth
Comment 10 2011-12-14 15:58:55 PST
> > Source/WebCore/html/parser/HTMLTreeBuilder.cpp:2765 > > + if (token.name() == SVGNames::scriptTag && m_tree.currentNode()->hasTagName(SVGNames::scriptTag)) { > > + m_isPaused = true; > > + m_scriptToProcess = m_tree.currentElement(); > > + m_tree.openElements()->pop(); > > + return; > > + } > > I'm not sure I fully follow this special case for script. > > > Source/WebCore/html/parser/HTMLTreeBuilder.cpp:2767 > > + // FIXME: This code just wants an Element* iterator, instead of an ElementRecord* > > Are there other places in the code which want this? Do we have a bug for this request? This is all code that existed previously in the function that handled the "in foreign content" insertion mode. It's just moving to this new location. > > Source/WebCore/html/parser/HTMLTreeBuilder.cpp:2779 > > + if (isInHTMLNamespace(nodeRecord->node())) > > + break; > > So we pop all open elements unless we bottom out at some HTML, in which case we just process the token regularly? I'm confused as to why we break here. This is also code that existed previously. The idea is that if you have a mis-matched end tag, then we'll close up all the foreign content and then ask the rest of the tree builder state machine (which knows about HTML) to figure out what to do with the errant close tag.
Adam Barth
Comment 11 2011-12-14 16:08:54 PST
(In reply to comment #9) > View in context: https://bugs.webkit.org/attachment.cgi?id=119307&action=review > > FYI, I compared this closely with the spec and it still looks good to me. Thanks. > > Source/WebCore/ChangeLog:16 > > + Tested by the html5lib LayoutTests. These tests so the progression in > > I don't follow this last sentence. That's because it's ungrammatical. Fixed. > > Source/WebCore/html/parser/HTMLTreeBuilder.cpp:2334 > > + // This check is the negation of what how we would normally write this > > This took a while to parse. It might be better just to do the if/else with the awkward fall through. I suspect it'd be easier to read, though possibly more error prone if someone modifies it. You're more familiar with WebKit though, so I'll leave it up to you. Ok. Will do.
Adam Barth
Comment 12 2011-12-14 16:10:53 PST
Early Warning System Bot
Comment 13 2011-12-14 16:15:25 PST
James Simonsen
Comment 14 2011-12-14 16:16:17 PST
View in context: https://bugs.webkit.org/attachment.cgi?id=119315&action=review Yeah, I like that better. It lines up perfectly with the spec. Thanks. > Source/WebCore/html/parser/HTMLTreeBuilder.cpp:2338 > + ASSERT(m_tree.currentNode()->isElementNode()); This isn't needed now. > Source/WebCore/html/parser/HTMLTreeBuilder.cpp:2339 > + ASSERT(insertionMode() == InTableMode || insertionMode() == InTableBodyMode || insertionMode() == InRowMode); This belongs at the top of the case before the if.
Adam Barth
Comment 15 2011-12-14 16:21:01 PST
> > Source/WebCore/html/parser/HTMLTreeBuilder.cpp:2338 > > + ASSERT(m_tree.currentNode()->isElementNode()); > > This isn't needed now. Removed. > > Source/WebCore/html/parser/HTMLTreeBuilder.cpp:2339 > > + ASSERT(insertionMode() == InTableMode || insertionMode() == InTableBodyMode || insertionMode() == InRowMode); > > This belongs at the top of the case before the if. Done. (I moved the other ASSERT too.)
Adam Barth
Comment 16 2011-12-14 16:23:33 PST
Early Warning System Bot
Comment 17 2011-12-14 16:42:13 PST
Eric Seidel (no email)
Comment 18 2011-12-15 01:18:12 PST
Comment on attachment 119320 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=119320&action=review LGTM. Please run the benchmark before landing. Why is Qt geborked? > Source/WebCore/html/parser/HTMLTreeBuilder.cpp:2784 > + String characters = String(token.characters().data(), token.characters().size()); Do we just have a copy() call? Did you test perf of this patch?
Adam Barth
Comment 19 2011-12-15 02:51:20 PST
> Why is Qt geborked? It looks like Qt has a problem with its incremental build. It doesn't seem to be picking up the change to mathattrs.in.
Adam Barth
Comment 20 2011-12-15 03:09:40 PST
I'm not even sure how to run the benchmark now that Safari automatically restarts if it thinks the page is hung.
Eric Seidel (no email)
Comment 21 2011-12-15 10:57:36 PST
I think we should run the benchmark in DRT, but that requires disabling the watchdog. Which should be an easy patch. :)
Adam Barth
Comment 22 2011-12-15 11:36:57 PST
I'm going to try using a Chrome build since Chrome doesn't have the auto-restart timer.
Adam Barth
Comment 23 2011-12-15 13:00:13 PST
I'm glad I ran the benchmark. There was a slight slowdown. Inlining the new processCharacterBufferForInBody fixed it.
Adam Barth
Comment 24 2011-12-15 13:35:30 PST
Note You need to log in before you can comment on or make changes to this bug.