Bug 43586 - Use the HTML5 TreeBuilder for Fragment Parsing
Summary: Use the HTML5 TreeBuilder for Fragment Parsing
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: 528+ (Nightly build)
Hardware: Other OS X 10.5
: P2 Normal
Assignee: Eric Seidel (no email)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-08-05 16:13 PDT by Eric Seidel (no email)
Modified: 2010-08-09 14:52 PDT (History)
1 user (show)

See Also:


Attachments
Patch (25.96 KB, patch)
2010-08-05 16:13 PDT, Eric Seidel (no email)
no flags Details | Formatted Diff | Diff
crashes due to DocumentParser use after free? (42.22 KB, patch)
2010-08-06 17:31 PDT, Eric Seidel (no email)
no flags Details | Formatted Diff | Diff
works (30.86 KB, patch)
2010-08-07 21:27 PDT, Eric Seidel (no email)
no flags Details | Formatted Diff | Diff
Patch (39.72 KB, patch)
2010-08-09 09:28 PDT, Eric Seidel (no email)
no flags Details | Formatted Diff | Diff
Patch (41.40 KB, patch)
2010-08-09 12:19 PDT, Eric Seidel (no email)
no flags Details | Formatted Diff | Diff
Patch for landing (41.38 KB, patch)
2010-08-09 12:40 PDT, Eric Seidel (no email)
eric: commit-queue-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Eric Seidel (no email) 2010-08-05 16:13:35 PDT
Use the HTML5 TreeBuilder for Fragment Parsing
Comment 1 Eric Seidel (no email) 2010-08-05 16:13:56 PDT
Created attachment 63656 [details]
Patch
Comment 2 Eric Seidel (no email) 2010-08-06 17:31:17 PDT
Created attachment 63788 [details]
crashes due to DocumentParser use after free?
Comment 3 Eric Seidel (no email) 2010-08-07 21:27:25 PDT
Created attachment 63837 [details]
works
Comment 4 Eric Seidel (no email) 2010-08-09 09:28:27 PDT
Created attachment 63903 [details]
Patch
Comment 5 Darin Adler 2010-08-09 09:33:17 PDT
Comment on attachment 63903 [details]
Patch

>      // FIXME: parse error
> +    // fragment case handled by callers.
>      mergeAttributesFromTokenIntoElement(token, m_openElements.bodyElement());

This is a confusing comment. Exactly what is handled by the callers? Is there some way to assert or check that?

It's also strange that the comment has a period but does not start with a capital letter.

> +// This is a direct transcription of step 4 from:
> +// http://www.whatwg.org/specs/web-apps/current-work/multipage/the-end.html#fragment-case
> +HTMLTokenizer::State tokenizerStateForContextElement(Element* contextElement, bool reportErrors)
> +{
> +    if (!contextElement)
> +        return HTMLTokenizer::DataState;
> +
> +    const QualifiedName& contextTag = contextElement->tagQName();
> +
> +    if (contextTag.matches(titleTag) || contextTag.matches(textareaTag))
> +        return HTMLTokenizer::RCDATAState;
> +    if (contextTag.matches(styleTag)
> +        || contextTag.matches(xmpTag)
> +        || contextTag.matches(iframeTag)
> +        || (contextTag.matches(noembedTag) && HTMLTreeBuilder::pluginsEnabled(contextElement->document()->frame()))
> +        || (contextTag.matches(noscriptTag) && HTMLTreeBuilder::scriptEnabled(contextElement->document()->frame()))
> +        || contextTag.matches(noframesTag))
> +        return reportErrors ? HTMLTokenizer::RAWTEXTState : HTMLTokenizer::PLAINTEXTState;
> +    if (contextTag.matches(scriptTag))
> +        return reportErrors ? HTMLTokenizer::ScriptDataState : HTMLTokenizer::PLAINTEXTState;
> +    if (contextTag.matches(plaintextTag))
> +        return HTMLTokenizer::PLAINTEXTState;
> +    return HTMLTokenizer::DataState;
> +}

It is better to write this code out like this rather than using functions in the element classes?

Have to run, no time to read the rest of the patch right now.
Comment 6 Eric Seidel (no email) 2010-08-09 09:40:38 PDT
(In reply to comment #5)
> (From update of attachment 63903 [details])
> >      // FIXME: parse error
> > +    // fragment case handled by callers.
> >      mergeAttributesFromTokenIntoElement(token, m_openElements.bodyElement());
> 
> This is a confusing comment. Exactly what is handled by the callers? Is there some way to assert or check that?

Agreed.  Basically that function should be killed or renamed.  I should kill it or rename it.  It was a victim of our first effort to split the original HTMLTreeBuilder.cpp into multiple data structures.  That function ended up on the wrong side of the split IMO.

> It's also strange that the comment has a period but does not start with a capital letter.

I'm just going to remove the comment, it's not helpful.

> It is better to write this code out like this rather than using functions in the element classes?

Interesting thought.  I think in this case, the if-cascade using tag names is better, since it more closely matches the spec.  But I agree with you, many times it's better to shove this stuff into the DOM implementations.

> Have to run, no time to read the rest of the patch right now.

Thanks for the look!
Comment 7 Adam Barth 2010-08-09 10:16:32 PDT
Comment on attachment 63903 [details]
Patch

WebCore/html/HTMLTreeBuilder.cpp:824
 +  bool isFragmentCaseWithoutBody(const HTMLElementStack* elementStack)
This is wrong.  You probably want to cache this information.  Basically, you're not covering the case when there are more than three elements on the stack.

WebCore/html/HTMLTreeBuilder.cpp:2902
 +          m_document->finishedParsing();
I would use a return here instead of an else.  The else is uglies.

WebCore/html/HTMLTreeBuilder.cpp:2900
 +      // Warning, this may delete the parser, so don't try to do anything else after this.
Please move this comment closer to m_document->finishedParsing().

WebCore/html/HTMLTreeBuilder.cpp:2914
 +              RefPtr<Node> newChild = fragment->document()->adoptNode(child, ec);
This is dangerous.  We need a version of adoptNode that doesn't run JavaScript.  Also, you should grab references to all the children first and then move them to their new home.

WebCore/html/HTMLTreeBuilder.h:212
 +          RefPtr<Document> m_dummyDocumentForFragmentParse;
m_dummyDocumentForFragmentParse -> m_dummyDocumentForFragmentParsing ?

WebCore/html/HTMLTreeBuilder.h:221
 +      FragmentParsingContext m_fragmentContext;
Insert blank line above.
Comment 8 Eric Seidel (no email) 2010-08-09 12:19:08 PDT
Created attachment 63915 [details]
Patch
Comment 9 Adam Barth 2010-08-09 12:26:55 PDT
Comment on attachment 63915 [details]
Patch

WebCore/html/HTMLElementStack.cpp:155
 +      return m_bodyElement;
!!m_bodyElement  ?
Comment 10 Eric Seidel (no email) 2010-08-09 12:40:24 PDT
Created attachment 63916 [details]
Patch for landing
Comment 11 Eric Seidel (no email) 2010-08-09 13:59:36 PDT
Comment on attachment 63916 [details]
Patch for landing

Failed runner.html on the commit-bot, will need one more tweak.
Comment 12 Eric Seidel (no email) 2010-08-09 14:52:24 PDT
Committed r65006: <http://trac.webkit.org/changeset/65006>