Bug 43586

Summary: Use the HTML5 TreeBuilder for Fragment Parsing
Product: WebKit Reporter: Eric Seidel (no email) <eric>
Component: New BugsAssignee: Eric Seidel (no email) <eric>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Other   
OS: OS X 10.5   
Attachments:
Description Flags
Patch
none
crashes due to DocumentParser use after free?
none
works
none
Patch
none
Patch
none
Patch for landing eric: commit-queue-

Description Eric Seidel (no email) 2010-08-05 16:13:35 PDT
Use the HTML5 TreeBuilder for Fragment Parsing
Comment 1 Eric Seidel (no email) 2010-08-05 16:13:56 PDT
Created attachment 63656 [details]
Patch
Comment 2 Eric Seidel (no email) 2010-08-06 17:31:17 PDT
Created attachment 63788 [details]
crashes due to DocumentParser use after free?
Comment 3 Eric Seidel (no email) 2010-08-07 21:27:25 PDT
Created attachment 63837 [details]
works
Comment 4 Eric Seidel (no email) 2010-08-09 09:28:27 PDT
Created attachment 63903 [details]
Patch
Comment 5 Darin Adler 2010-08-09 09:33:17 PDT
Comment on attachment 63903 [details]
Patch

>      // FIXME: parse error
> +    // fragment case handled by callers.
>      mergeAttributesFromTokenIntoElement(token, m_openElements.bodyElement());

This is a confusing comment. Exactly what is handled by the callers? Is there some way to assert or check that?

It's also strange that the comment has a period but does not start with a capital letter.

> +// This is a direct transcription of step 4 from:
> +// http://www.whatwg.org/specs/web-apps/current-work/multipage/the-end.html#fragment-case
> +HTMLTokenizer::State tokenizerStateForContextElement(Element* contextElement, bool reportErrors)
> +{
> +    if (!contextElement)
> +        return HTMLTokenizer::DataState;
> +
> +    const QualifiedName& contextTag = contextElement->tagQName();
> +
> +    if (contextTag.matches(titleTag) || contextTag.matches(textareaTag))
> +        return HTMLTokenizer::RCDATAState;
> +    if (contextTag.matches(styleTag)
> +        || contextTag.matches(xmpTag)
> +        || contextTag.matches(iframeTag)
> +        || (contextTag.matches(noembedTag) && HTMLTreeBuilder::pluginsEnabled(contextElement->document()->frame()))
> +        || (contextTag.matches(noscriptTag) && HTMLTreeBuilder::scriptEnabled(contextElement->document()->frame()))
> +        || contextTag.matches(noframesTag))
> +        return reportErrors ? HTMLTokenizer::RAWTEXTState : HTMLTokenizer::PLAINTEXTState;
> +    if (contextTag.matches(scriptTag))
> +        return reportErrors ? HTMLTokenizer::ScriptDataState : HTMLTokenizer::PLAINTEXTState;
> +    if (contextTag.matches(plaintextTag))
> +        return HTMLTokenizer::PLAINTEXTState;
> +    return HTMLTokenizer::DataState;
> +}

It is better to write this code out like this rather than using functions in the element classes?

Have to run, no time to read the rest of the patch right now.
Comment 6 Eric Seidel (no email) 2010-08-09 09:40:38 PDT
(In reply to comment #5)
> (From update of attachment 63903 [details])
> >      // FIXME: parse error
> > +    // fragment case handled by callers.
> >      mergeAttributesFromTokenIntoElement(token, m_openElements.bodyElement());
> 
> This is a confusing comment. Exactly what is handled by the callers? Is there some way to assert or check that?

Agreed.  Basically that function should be killed or renamed.  I should kill it or rename it.  It was a victim of our first effort to split the original HTMLTreeBuilder.cpp into multiple data structures.  That function ended up on the wrong side of the split IMO.

> It's also strange that the comment has a period but does not start with a capital letter.

I'm just going to remove the comment, it's not helpful.

> It is better to write this code out like this rather than using functions in the element classes?

Interesting thought.  I think in this case, the if-cascade using tag names is better, since it more closely matches the spec.  But I agree with you, many times it's better to shove this stuff into the DOM implementations.

> Have to run, no time to read the rest of the patch right now.

Thanks for the look!
Comment 7 Adam Barth 2010-08-09 10:16:32 PDT
Comment on attachment 63903 [details]
Patch

WebCore/html/HTMLTreeBuilder.cpp:824
 +  bool isFragmentCaseWithoutBody(const HTMLElementStack* elementStack)
This is wrong.  You probably want to cache this information.  Basically, you're not covering the case when there are more than three elements on the stack.

WebCore/html/HTMLTreeBuilder.cpp:2902
 +          m_document->finishedParsing();
I would use a return here instead of an else.  The else is uglies.

WebCore/html/HTMLTreeBuilder.cpp:2900
 +      // Warning, this may delete the parser, so don't try to do anything else after this.
Please move this comment closer to m_document->finishedParsing().

WebCore/html/HTMLTreeBuilder.cpp:2914
 +              RefPtr<Node> newChild = fragment->document()->adoptNode(child, ec);
This is dangerous.  We need a version of adoptNode that doesn't run JavaScript.  Also, you should grab references to all the children first and then move them to their new home.

WebCore/html/HTMLTreeBuilder.h:212
 +          RefPtr<Document> m_dummyDocumentForFragmentParse;
m_dummyDocumentForFragmentParse -> m_dummyDocumentForFragmentParsing ?

WebCore/html/HTMLTreeBuilder.h:221
 +      FragmentParsingContext m_fragmentContext;
Insert blank line above.
Comment 8 Eric Seidel (no email) 2010-08-09 12:19:08 PDT
Created attachment 63915 [details]
Patch
Comment 9 Adam Barth 2010-08-09 12:26:55 PDT
Comment on attachment 63915 [details]
Patch

WebCore/html/HTMLElementStack.cpp:155
 +      return m_bodyElement;
!!m_bodyElement  ?
Comment 10 Eric Seidel (no email) 2010-08-09 12:40:24 PDT
Created attachment 63916 [details]
Patch for landing
Comment 11 Eric Seidel (no email) 2010-08-09 13:59:36 PDT
Comment on attachment 63916 [details]
Patch for landing

Failed runner.html on the commit-bot, will need one more tweak.
Comment 12 Eric Seidel (no email) 2010-08-09 14:52:24 PDT
Committed r65006: <http://trac.webkit.org/changeset/65006>