RESOLVED FIXED 43586
Use the HTML5 TreeBuilder for Fragment Parsing
https://bugs.webkit.org/show_bug.cgi?id=43586
Summary Use the HTML5 TreeBuilder for Fragment Parsing
Eric Seidel (no email)
Reported 2010-08-05 16:13:35 PDT
Use the HTML5 TreeBuilder for Fragment Parsing
Attachments
Patch (25.96 KB, patch)
2010-08-05 16:13 PDT, Eric Seidel (no email)
no flags
crashes due to DocumentParser use after free? (42.22 KB, patch)
2010-08-06 17:31 PDT, Eric Seidel (no email)
no flags
works (30.86 KB, patch)
2010-08-07 21:27 PDT, Eric Seidel (no email)
no flags
Patch (39.72 KB, patch)
2010-08-09 09:28 PDT, Eric Seidel (no email)
no flags
Patch (41.40 KB, patch)
2010-08-09 12:19 PDT, Eric Seidel (no email)
no flags
Patch for landing (41.38 KB, patch)
2010-08-09 12:40 PDT, Eric Seidel (no email)
eric: commit-queue-
Eric Seidel (no email)
Comment 1 2010-08-05 16:13:56 PDT
Eric Seidel (no email)
Comment 2 2010-08-06 17:31:17 PDT
Created attachment 63788 [details] crashes due to DocumentParser use after free?
Eric Seidel (no email)
Comment 3 2010-08-07 21:27:25 PDT
Eric Seidel (no email)
Comment 4 2010-08-09 09:28:27 PDT
Darin Adler
Comment 5 2010-08-09 09:33:17 PDT
Comment on attachment 63903 [details] Patch > // FIXME: parse error > + // fragment case handled by callers. > mergeAttributesFromTokenIntoElement(token, m_openElements.bodyElement()); This is a confusing comment. Exactly what is handled by the callers? Is there some way to assert or check that? It's also strange that the comment has a period but does not start with a capital letter. > +// This is a direct transcription of step 4 from: > +// http://www.whatwg.org/specs/web-apps/current-work/multipage/the-end.html#fragment-case > +HTMLTokenizer::State tokenizerStateForContextElement(Element* contextElement, bool reportErrors) > +{ > + if (!contextElement) > + return HTMLTokenizer::DataState; > + > + const QualifiedName& contextTag = contextElement->tagQName(); > + > + if (contextTag.matches(titleTag) || contextTag.matches(textareaTag)) > + return HTMLTokenizer::RCDATAState; > + if (contextTag.matches(styleTag) > + || contextTag.matches(xmpTag) > + || contextTag.matches(iframeTag) > + || (contextTag.matches(noembedTag) && HTMLTreeBuilder::pluginsEnabled(contextElement->document()->frame())) > + || (contextTag.matches(noscriptTag) && HTMLTreeBuilder::scriptEnabled(contextElement->document()->frame())) > + || contextTag.matches(noframesTag)) > + return reportErrors ? HTMLTokenizer::RAWTEXTState : HTMLTokenizer::PLAINTEXTState; > + if (contextTag.matches(scriptTag)) > + return reportErrors ? HTMLTokenizer::ScriptDataState : HTMLTokenizer::PLAINTEXTState; > + if (contextTag.matches(plaintextTag)) > + return HTMLTokenizer::PLAINTEXTState; > + return HTMLTokenizer::DataState; > +} It is better to write this code out like this rather than using functions in the element classes? Have to run, no time to read the rest of the patch right now.
Eric Seidel (no email)
Comment 6 2010-08-09 09:40:38 PDT
(In reply to comment #5) > (From update of attachment 63903 [details]) > > // FIXME: parse error > > + // fragment case handled by callers. > > mergeAttributesFromTokenIntoElement(token, m_openElements.bodyElement()); > > This is a confusing comment. Exactly what is handled by the callers? Is there some way to assert or check that? Agreed. Basically that function should be killed or renamed. I should kill it or rename it. It was a victim of our first effort to split the original HTMLTreeBuilder.cpp into multiple data structures. That function ended up on the wrong side of the split IMO. > It's also strange that the comment has a period but does not start with a capital letter. I'm just going to remove the comment, it's not helpful. > It is better to write this code out like this rather than using functions in the element classes? Interesting thought. I think in this case, the if-cascade using tag names is better, since it more closely matches the spec. But I agree with you, many times it's better to shove this stuff into the DOM implementations. > Have to run, no time to read the rest of the patch right now. Thanks for the look!
Adam Barth
Comment 7 2010-08-09 10:16:32 PDT
Comment on attachment 63903 [details] Patch WebCore/html/HTMLTreeBuilder.cpp:824 + bool isFragmentCaseWithoutBody(const HTMLElementStack* elementStack) This is wrong. You probably want to cache this information. Basically, you're not covering the case when there are more than three elements on the stack. WebCore/html/HTMLTreeBuilder.cpp:2902 + m_document->finishedParsing(); I would use a return here instead of an else. The else is uglies. WebCore/html/HTMLTreeBuilder.cpp:2900 + // Warning, this may delete the parser, so don't try to do anything else after this. Please move this comment closer to m_document->finishedParsing(). WebCore/html/HTMLTreeBuilder.cpp:2914 + RefPtr<Node> newChild = fragment->document()->adoptNode(child, ec); This is dangerous. We need a version of adoptNode that doesn't run JavaScript. Also, you should grab references to all the children first and then move them to their new home. WebCore/html/HTMLTreeBuilder.h:212 + RefPtr<Document> m_dummyDocumentForFragmentParse; m_dummyDocumentForFragmentParse -> m_dummyDocumentForFragmentParsing ? WebCore/html/HTMLTreeBuilder.h:221 + FragmentParsingContext m_fragmentContext; Insert blank line above.
Eric Seidel (no email)
Comment 8 2010-08-09 12:19:08 PDT
Adam Barth
Comment 9 2010-08-09 12:26:55 PDT
Comment on attachment 63915 [details] Patch WebCore/html/HTMLElementStack.cpp:155 + return m_bodyElement; !!m_bodyElement ?
Eric Seidel (no email)
Comment 10 2010-08-09 12:40:24 PDT
Created attachment 63916 [details] Patch for landing
Eric Seidel (no email)
Comment 11 2010-08-09 13:59:36 PDT
Comment on attachment 63916 [details] Patch for landing Failed runner.html on the commit-bot, will need one more tweak.
Eric Seidel (no email)
Comment 12 2010-08-09 14:52:24 PDT
Note You need to log in before you can comment on or make changes to this bug.