Summary: | <table><tr><td><svg><desc><td> parses incorrectly | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Ian 'Hixie' Hickson <ian> | ||||||||||
Component: | DOM | Assignee: | Adam Barth <abarth> | ||||||||||
Status: | RESOLVED FIXED | ||||||||||||
Severity: | Normal | CC: | abarth, eric, simonjam, tonyg | ||||||||||
Priority: | P2 | ||||||||||||
Version: | 528+ (Nightly build) | ||||||||||||
Hardware: | Unspecified | ||||||||||||
OS: | Unspecified | ||||||||||||
Attachments: |
|
Description
Ian 'Hixie' Hickson
2011-09-14 12:39:04 PDT
Thanks for the report. Fixing this bug will require updating the HTMLTreeBuilder state machine to match the updated state machine in the spec. Working on a patch for this bug. Created attachment 118947 [details]
work in progress
Comment on attachment 118947 [details]
work in progress
FWIW, I read the spec alongside this change and everything looks good to me. It'll be nice to be caught up with foreign content mode again.
> FWIW, I read the spec alongside this change and everything looks good to me. It'll be nice to be caught up with foreign content mode again.
Thanks. This patch uncovered a bug in the spec, which Hixie has just fixed. I'm going to try incorporating his fix and then we should have something reviewable.
Created attachment 119307 [details]
Patch
Comment on attachment 119307 [details] Patch Attachment 119307 [details] did not pass qt-ews (qt): Output: http://queues.webkit.org/results/10875456 Comment on attachment 119307 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=119307&action=review > Source/WebCore/html/parser/HTMLElementStack.cpp:293 > + // FIXME: Technically we shouldn't read back from the DOM here. You should explain what we're supposed to do instead. :) > Source/WebCore/html/parser/HTMLTreeBuilder.cpp:469 > + m_parser->tokenizer()->setShouldAllowCDATA(m_tree.openElements()->stackDepth() > 0 && !isInHTMLNamespace(m_tree.currentNode())); Do we have an !isEmpty() instead? > Source/WebCore/html/parser/HTMLTreeBuilder.cpp:2660 > + if (!m_tree.openElements()->stackDepth()) Again an isEmpty (if available) is probably clearer here. > Source/WebCore/html/parser/HTMLTreeBuilder.cpp:2664 > + ContainerNode* node = m_tree.currentNode(); > + if (!node) > + return false; So this is a hack around the hack of allowing non-nodes onto the element stack? Or how do we get to this case? > Source/WebCore/html/parser/HTMLTreeBuilder.cpp:2670 > + && token.name() != MathMLNames::mglyphTag.localName() > + && token.name() != MathMLNames::malignmarkTag.localName()) We don't have a shorter way to write this? either token.hasTagName(mglyphTag) or mglyphTag.hasLocalName(token.name)? > Source/WebCore/html/parser/HTMLTreeBuilder.cpp:2701 > + || token.name() == bigTag Contrasting this line with my previous comment, this seems cleaner. There must be an AtomicString == QualifiedName comparison operator? > Source/WebCore/html/parser/HTMLTreeBuilder.cpp:2765 > + if (token.name() == SVGNames::scriptTag && m_tree.currentNode()->hasTagName(SVGNames::scriptTag)) { > + m_isPaused = true; > + m_scriptToProcess = m_tree.currentElement(); > + m_tree.openElements()->pop(); > + return; > + } I'm not sure I fully follow this special case for script. > Source/WebCore/html/parser/HTMLTreeBuilder.cpp:2767 > + // FIXME: This code just wants an Element* iterator, instead of an ElementRecord* Are there other places in the code which want this? Do we have a bug for this request? > Source/WebCore/html/parser/HTMLTreeBuilder.cpp:2779 > + if (isInHTMLNamespace(nodeRecord->node())) > + break; So we pop all open elements unless we bottom out at some HTML, in which case we just process the token regularly? I'm confused as to why we break here. View in context: https://bugs.webkit.org/attachment.cgi?id=119307&action=review FYI, I compared this closely with the spec and it still looks good to me. > Source/WebCore/ChangeLog:16 > + Tested by the html5lib LayoutTests. These tests so the progression in I don't follow this last sentence. > Source/WebCore/html/parser/HTMLTreeBuilder.cpp:2334 > + // This check is the negation of what how we would normally write this This took a while to parse. It might be better just to do the if/else with the awkward fall through. I suspect it'd be easier to read, though possibly more error prone if someone modifies it. You're more familiar with WebKit though, so I'll leave it up to you. > > Source/WebCore/html/parser/HTMLTreeBuilder.cpp:2765 > > + if (token.name() == SVGNames::scriptTag && m_tree.currentNode()->hasTagName(SVGNames::scriptTag)) { > > + m_isPaused = true; > > + m_scriptToProcess = m_tree.currentElement(); > > + m_tree.openElements()->pop(); > > + return; > > + } > > I'm not sure I fully follow this special case for script. > > > Source/WebCore/html/parser/HTMLTreeBuilder.cpp:2767 > > + // FIXME: This code just wants an Element* iterator, instead of an ElementRecord* > > Are there other places in the code which want this? Do we have a bug for this request? This is all code that existed previously in the function that handled the "in foreign content" insertion mode. It's just moving to this new location. > > Source/WebCore/html/parser/HTMLTreeBuilder.cpp:2779 > > + if (isInHTMLNamespace(nodeRecord->node())) > > + break; > > So we pop all open elements unless we bottom out at some HTML, in which case we just process the token regularly? I'm confused as to why we break here. This is also code that existed previously. The idea is that if you have a mis-matched end tag, then we'll close up all the foreign content and then ask the rest of the tree builder state machine (which knows about HTML) to figure out what to do with the errant close tag. (In reply to comment #9) > View in context: https://bugs.webkit.org/attachment.cgi?id=119307&action=review > > FYI, I compared this closely with the spec and it still looks good to me. Thanks. > > Source/WebCore/ChangeLog:16 > > + Tested by the html5lib LayoutTests. These tests so the progression in > > I don't follow this last sentence. That's because it's ungrammatical. Fixed. > > Source/WebCore/html/parser/HTMLTreeBuilder.cpp:2334 > > + // This check is the negation of what how we would normally write this > > This took a while to parse. It might be better just to do the if/else with the awkward fall through. I suspect it'd be easier to read, though possibly more error prone if someone modifies it. You're more familiar with WebKit though, so I'll leave it up to you. Ok. Will do. Created attachment 119315 [details]
Patch
Comment on attachment 119315 [details] Patch Attachment 119315 [details] did not pass qt-ews (qt): Output: http://queues.webkit.org/results/10903001 View in context: https://bugs.webkit.org/attachment.cgi?id=119315&action=review Yeah, I like that better. It lines up perfectly with the spec. Thanks. > Source/WebCore/html/parser/HTMLTreeBuilder.cpp:2338 > + ASSERT(m_tree.currentNode()->isElementNode()); This isn't needed now. > Source/WebCore/html/parser/HTMLTreeBuilder.cpp:2339 > + ASSERT(insertionMode() == InTableMode || insertionMode() == InTableBodyMode || insertionMode() == InRowMode); This belongs at the top of the case before the if. > > Source/WebCore/html/parser/HTMLTreeBuilder.cpp:2338 > > + ASSERT(m_tree.currentNode()->isElementNode()); > > This isn't needed now. Removed. > > Source/WebCore/html/parser/HTMLTreeBuilder.cpp:2339 > > + ASSERT(insertionMode() == InTableMode || insertionMode() == InTableBodyMode || insertionMode() == InRowMode); > > This belongs at the top of the case before the if. Done. (I moved the other ASSERT too.) Created attachment 119320 [details]
Patch
Comment on attachment 119320 [details] Patch Attachment 119320 [details] did not pass qt-ews (qt): Output: http://queues.webkit.org/results/10905022 Comment on attachment 119320 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=119320&action=review LGTM. Please run the benchmark before landing. Why is Qt geborked? > Source/WebCore/html/parser/HTMLTreeBuilder.cpp:2784 > + String characters = String(token.characters().data(), token.characters().size()); Do we just have a copy() call? Did you test perf of this patch? > Why is Qt geborked?
It looks like Qt has a problem with its incremental build. It doesn't seem to be picking up the change to mathattrs.in.
I'm not even sure how to run the benchmark now that Safari automatically restarts if it thinks the page is hung. I think we should run the benchmark in DRT, but that requires disabling the watchdog. Which should be an easy patch. :) I'm going to try using a Chrome build since Chrome doesn't have the auto-restart timer. I'm glad I ran the benchmark. There was a slight slowdown. Inlining the new processCharacterBufferForInBody fixed it. Committed r102981: <http://trac.webkit.org/changeset/102981> |