43586 – Use the HTML5 TreeBuilder for Fragment Parsing

RESOLVED FIXED43586

Use the HTML5 TreeBuilder for Fragment Parsing

https://bugs.webkit.org/show_bug.cgi?id=43586

Summary Use the HTML5 TreeBuilder for Fragment Parsing

Eric Seidel (no email)

Reported 2010-08-05 16:13:35 PDT

Use the HTML5 TreeBuilder for Fragment Parsing

Attachments
Patch (25.96 KB, patch) 2010-08-05 16:13 PDT, Eric Seidel (no email)	no flags	Details Formatted Diff Diff
crashes due to DocumentParser use after free? (42.22 KB, patch) 2010-08-06 17:31 PDT, Eric Seidel (no email)	no flags	Details Formatted Diff Diff
works (30.86 KB, patch) 2010-08-07 21:27 PDT, Eric Seidel (no email)	no flags	Details Formatted Diff Diff
Patch (39.72 KB, patch) 2010-08-09 09:28 PDT, Eric Seidel (no email)	no flags	Details Formatted Diff Diff
Patch (41.40 KB, patch) 2010-08-09 12:19 PDT, Eric Seidel (no email)	no flags	Details Formatted Diff Diff
Patch for landing (41.38 KB, patch) 2010-08-09 12:40 PDT, Eric Seidel (no email)	eric: commit-queue-	Details Formatted Diff Diff
Show Obsolete (5) View All Add attachment proposed patch, testcase, etc.

Eric Seidel (no email)

Comment 1 2010-08-05 16:13:56 PDT

Created attachment 63656 [details] Patch

Eric Seidel (no email)

Comment 2 2010-08-06 17:31:17 PDT

Created attachment 63788 [details] crashes due to DocumentParser use after free?

Eric Seidel (no email)

Comment 3 2010-08-07 21:27:25 PDT

Created attachment 63837 [details] works

Eric Seidel (no email)

Comment 4 2010-08-09 09:28:27 PDT

Created attachment 63903 [details] Patch

Darin Adler

Comment 5 2010-08-09 09:33:17 PDT

Comment on attachment 63903 [details] Patch > // FIXME: parse error > + // fragment case handled by callers. > mergeAttributesFromTokenIntoElement(token, m_openElements.bodyElement()); This is a confusing comment. Exactly what is handled by the callers? Is there some way to assert or check that? It's also strange that the comment has a period but does not start with a capital letter. > +// This is a direct transcription of step 4 from: > +// http://www.whatwg.org/specs/web-apps/current-work/multipage/the-end.html#fragment-case > +HTMLTokenizer::State tokenizerStateForContextElement(Element* contextElement, bool reportErrors) > +{ > + if (!contextElement) > + return HTMLTokenizer::DataState; > + > + const QualifiedName& contextTag = contextElement->tagQName(); > + > + if (contextTag.matches(titleTag) || contextTag.matches(textareaTag)) > + return HTMLTokenizer::RCDATAState; > + if (contextTag.matches(styleTag) > + || contextTag.matches(xmpTag) > + || contextTag.matches(iframeTag) > + || (contextTag.matches(noembedTag) && HTMLTreeBuilder::pluginsEnabled(contextElement->document()->frame())) > + || (contextTag.matches(noscriptTag) && HTMLTreeBuilder::scriptEnabled(contextElement->document()->frame())) > + || contextTag.matches(noframesTag)) > + return reportErrors ? HTMLTokenizer::RAWTEXTState : HTMLTokenizer::PLAINTEXTState; > + if (contextTag.matches(scriptTag)) > + return reportErrors ? HTMLTokenizer::ScriptDataState : HTMLTokenizer::PLAINTEXTState; > + if (contextTag.matches(plaintextTag)) > + return HTMLTokenizer::PLAINTEXTState; > + return HTMLTokenizer::DataState; > +} It is better to write this code out like this rather than using functions in the element classes? Have to run, no time to read the rest of the patch right now.

Eric Seidel (no email)

Comment 6 2010-08-09 09:40:38 PDT

(In reply to comment #5) > (From update of attachment 63903 [details]) > > // FIXME: parse error > > + // fragment case handled by callers. > > mergeAttributesFromTokenIntoElement(token, m_openElements.bodyElement()); > > This is a confusing comment. Exactly what is handled by the callers? Is there some way to assert or check that? Agreed. Basically that function should be killed or renamed. I should kill it or rename it. It was a victim of our first effort to split the original HTMLTreeBuilder.cpp into multiple data structures. That function ended up on the wrong side of the split IMO. > It's also strange that the comment has a period but does not start with a capital letter. I'm just going to remove the comment, it's not helpful. > It is better to write this code out like this rather than using functions in the element classes? Interesting thought. I think in this case, the if-cascade using tag names is better, since it more closely matches the spec. But I agree with you, many times it's better to shove this stuff into the DOM implementations. > Have to run, no time to read the rest of the patch right now. Thanks for the look!

Adam Barth

Comment 7 2010-08-09 10:16:32 PDT

Comment on attachment 63903 [details] Patch WebCore/html/HTMLTreeBuilder.cpp:824 + bool isFragmentCaseWithoutBody(const HTMLElementStack* elementStack) This is wrong. You probably want to cache this information. Basically, you're not covering the case when there are more than three elements on the stack. WebCore/html/HTMLTreeBuilder.cpp:2902 + m_document->finishedParsing(); I would use a return here instead of an else. The else is uglies. WebCore/html/HTMLTreeBuilder.cpp:2900 + // Warning, this may delete the parser, so don't try to do anything else after this. Please move this comment closer to m_document->finishedParsing(). WebCore/html/HTMLTreeBuilder.cpp:2914 + RefPtr<Node> newChild = fragment->document()->adoptNode(child, ec); This is dangerous. We need a version of adoptNode that doesn't run JavaScript. Also, you should grab references to all the children first and then move them to their new home. WebCore/html/HTMLTreeBuilder.h:212 + RefPtr<Document> m_dummyDocumentForFragmentParse; m_dummyDocumentForFragmentParse -> m_dummyDocumentForFragmentParsing ? WebCore/html/HTMLTreeBuilder.h:221 + FragmentParsingContext m_fragmentContext; Insert blank line above.

Eric Seidel (no email)

Comment 8 2010-08-09 12:19:08 PDT

Created attachment 63915 [details] Patch

Adam Barth

Comment 9 2010-08-09 12:26:55 PDT

Comment on attachment 63915 [details] Patch WebCore/html/HTMLElementStack.cpp:155 + return m_bodyElement; !!m_bodyElement ?

Eric Seidel (no email)

Comment 10 2010-08-09 12:40:24 PDT

Created attachment 63916 [details] Patch for landing

Eric Seidel (no email)

Comment 11 2010-08-09 13:59:36 PDT

Comment on attachment 63916 [details] Patch for landing Failed runner.html on the commit-bot, will need one more tweak.

Eric Seidel (no email)

Comment 12 2010-08-09 14:52:24 PDT

Committed r65006: <http://trac.webkit.org/changeset/65006>

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware Other

OS OS X 10.5

Product WebKit

Component New Bugs

Assignee

Eric Seidel (no email)

Reported

2010-08-05 16:13 PDT

Modified

2010-08-09 14:52 PDT History

CC List

1 user Show

URL

Keywords

Depends on

Blocks