Bug 66406 - New XML parser: scripting support
Summary: New XML parser: scripting support
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Vicki Pfau
URL:
Keywords:
Depends on:
Blocks: 64396
  Show dependency treegraph
 
Reported: 2011-08-17 13:46 PDT by Vicki Pfau
Modified: 2011-08-18 18:09 PDT (History)
6 users (show)

See Also:


Attachments
Patch (13.24 KB, patch)
2011-08-17 13:48 PDT, Vicki Pfau
no flags Details | Formatted Diff | Diff
Patch (10.89 KB, patch)
2011-08-18 16:32 PDT, Vicki Pfau
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Vicki Pfau 2011-08-17 13:46:55 PDT
New XML parser: scripting support
Comment 1 Vicki Pfau 2011-08-17 13:48:14 PDT
Created attachment 104229 [details]
Patch
Comment 2 Sam Weinig 2011-08-17 14:08:12 PDT
Comment on attachment 104229 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=104229&action=review

> Source/WebCore/ChangeLog:25
> +        Reviewed by NOBODY (OOPS!).
> +
> +        * xml/parser/NewXMLDocumentParser.cpp:
> +        (WebCore::NewXMLDocumentParser::NewXMLDocumentParser):
> +        (WebCore::NewXMLDocumentParser::resumeParsing):
> +        (WebCore::NewXMLDocumentParser::processScript):
> +        (WebCore::NewXMLDocumentParser::textPosition):
> +        (WebCore::NewXMLDocumentParser::lineNumber):
> +        (WebCore::NewXMLDocumentParser::append):
> +        (WebCore::NewXMLDocumentParser::finish):
> +        (WebCore::NewXMLDocumentParser::finishWasCalled):
> +        (WebCore::NewXMLDocumentParser::notifyFinished):
> +        * xml/parser/NewXMLDocumentParser.h:
> +        (WebCore::NewXMLDocumentParser::pauseParsing):
> +        (WebCore::NewXMLDocumentParser::setScriptStartPosition):
> +        * xml/parser/XMLTreeBuilder.cpp:
> +        (WebCore::XMLTreeBuilder::closeElement):
> +        (WebCore::XMLTreeBuilder::processStartTag):
> +        (WebCore::XMLTreeBuilder::processEndTag):
> +        * xml/parser/XMLTreeBuilder.h:

Please include more details in your change log about the functions that are changing.

> Source/WebCore/xml/parser/NewXMLDocumentParser.cpp:226
> +        scriptElement->executeScript(sourceCode);

Can executeScript cause scriptElement to be deallocated?  Should scriptElement be in a RefPtr?
Comment 3 Sam Weinig 2011-08-17 14:08:39 PDT
Does this enable any interesting tests to run?
Comment 4 Vicki Pfau 2011-08-17 14:17:25 PDT
(In reply to comment #2)
> (From update of attachment 104229 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=104229&action=review
> 
> > Source/WebCore/xml/parser/NewXMLDocumentParser.cpp:226
> > +        scriptElement->executeScript(sourceCode);
> 
> Can executeScript cause scriptElement to be deallocated?  Should scriptElement be in a RefPtr?

scriptElement is not refcounted. It is extracted from the Element object (in either XMLTreeBuilder::StartTag or EndTag), which is in a RefPtr already.

(In reply to comment #3)
> Does this enable any interesting tests to run?

A lot! But I don't have any super-specific examples.
Comment 5 Darin Adler 2011-08-17 14:23:16 PDT
Comment on attachment 104229 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=104229&action=review

> Source/WebCore/xml/parser/NewXMLDocumentParser.cpp:45
> +    , m_pendingScript(0)
> +    , m_scriptElement(0)

No need to explicitly initialize a CachedResourceHandle or RefPtr. They get set to 0 by their default constructor.

> Source/WebCore/xml/parser/NewXMLDocumentParser.cpp:59
> +    , m_pendingScript(0)
> +    , m_scriptElement(0)

Ditto.
Comment 6 Adam Barth 2011-08-17 16:17:48 PDT
+ some folks who might like to review this patch.
Comment 7 Adam Barth 2011-08-17 16:25:49 PDT
Comment on attachment 104229 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=104229&action=review

> Source/WebCore/xml/parser/NewXMLDocumentParser.cpp:105
> +    if (scriptElement->prepareScript(m_scriptStartPosition, ScriptElement::AllowLegacyTypeInTypeAttribute)) {
> +        if (scriptElement->readyToBeParserExecuted())
> +            scriptElement->executeScript(ScriptSourceCode(scriptElement->scriptContent(), document()->url(), m_scriptStartPosition));
> +        else if (scriptElement->willBeParserExecuted()) {
> +            m_pendingScript = scriptElement->cachedScript();
> +            m_scriptElement = scriptElement->element();
> +            m_pendingScript->addClient(this);
> +
> +            // m_pendingScript will be 0 if script was already loaded and addClient() executed it.
> +            if (m_pendingScript)
> +                pauseParsing();
> +        } else
> +            m_scriptElement = 0;
> +    }

Is this code copy/pasted from the HTML parser?  Can the code be shared instead?

> Source/WebCore/xml/parser/NewXMLDocumentParser.cpp:119
> -    return 0;
> +    return m_tokenizer->lineNumber() + 1;

The TextPosition types are supposed to save us from this kind of +1 madness.  There should be some way for the C++ type system to do this work for you.

> Source/WebCore/xml/parser/NewXMLDocumentParser.cpp:131
> +    // Do some bookkeeping to make sure that re-appending doesn't mess up the cursor

What is re-appending?  Why would we append the same string more than once?

> Source/WebCore/xml/parser/NewXMLDocumentParser.cpp:169
> +    if (m_parserPaused)
> +        return;

Why can't you tell the parser to finish while it's paused?

> Source/WebCore/xml/parser/NewXMLDocumentParser.cpp:185
> -    return m_finishWasCalled;
> +    return m_parserPaused || m_finishWasCalled;

This is clearly wrong.  This function is just an accessor for m_finishWasCalled.
Comment 8 Vicki Pfau 2011-08-17 16:58:30 PDT
(In reply to comment #7)
> (From update of attachment 104229 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=104229&action=review
> 
> > Source/WebCore/xml/parser/NewXMLDocumentParser.cpp:105
> > +    if (scriptElement->prepareScript(m_scriptStartPosition, ScriptElement::AllowLegacyTypeInTypeAttribute)) {
> > +        if (scriptElement->readyToBeParserExecuted())
> > +            scriptElement->executeScript(ScriptSourceCode(scriptElement->scriptContent(), document()->url(), m_scriptStartPosition));
> > +        else if (scriptElement->willBeParserExecuted()) {
> > +            m_pendingScript = scriptElement->cachedScript();
> > +            m_scriptElement = scriptElement->element();
> > +            m_pendingScript->addClient(this);
> > +
> > +            // m_pendingScript will be 0 if script was already loaded and addClient() executed it.
> > +            if (m_pendingScript)
> > +                pauseParsing();
> > +        } else
> > +            m_scriptElement = 0;
> > +    }
> 
> Is this code copy/pasted from the HTML parser?  Can the code be shared instead?

It's copy-pasted from the other XML parsers, with a little modification. For what it's worth, the code is already copy-pasted between the Qt and the libxml2 ones, so it might make sense to refactor it all out somewhere. Not sure exactly where, though.

> 
> > Source/WebCore/xml/parser/NewXMLDocumentParser.cpp:119
> > -    return 0;
> > +    return m_tokenizer->lineNumber() + 1;
> 
> The TextPosition types are supposed to save us from this kind of +1 madness.  There should be some way for the C++ type system to do this work for you.
> 

This function returns int in the superclass, and the tokenizer returns int too, just with a different base (apparently). It's also only line number, while TextPosition keeps track of column too.

> > Source/WebCore/xml/parser/NewXMLDocumentParser.cpp:131
> > +    // Do some bookkeeping to make sure that re-appending doesn't mess up the cursor
> 
> What is re-appending?  Why would we append the same string more than once?
> 

Parsing a script can cause it to pause parsing and re-enter append() with the same SegmentedString (except with it being advanced in the process, so the line numbers change).

> > Source/WebCore/xml/parser/NewXMLDocumentParser.cpp:169
> > +    if (m_parserPaused)
> > +        return;
> 
> Why can't you tell the parser to finish while it's paused?

Because the parser isn't finished. If we tell it it's finished, it will just stop when we're done with that token. This function gets called when we drop out of the tokenizer, which we need to do when we're processing a remote script (otherwise said script never gets to load). If we do tell it to finish, it will never re-enter the tokenizer after it's done with that script.

> > Source/WebCore/xml/parser/NewXMLDocumentParser.cpp:185
> > -    return m_finishWasCalled;
> > +    return m_parserPaused || m_finishWasCalled;
> 
> This is clearly wrong.  This function is just an accessor for m_finishWasCalled.

This doesn't appear to be called anywhere, anyway. Not sure why I changed it.
Comment 9 James Simonsen 2011-08-17 17:13:16 PDT
Comment on attachment 104229 [details]
Patch

The ScriptElement handling looks good to me. It's very similar to the existing XML parsers.

View in context: https://bugs.webkit.org/attachment.cgi?id=104229&action=review

>> Source/WebCore/xml/parser/NewXMLDocumentParser.cpp:169
>> +        return;
> 
> Why can't you tell the parser to finish while it's paused?

The script may document.write() into it. Or is that sort of thing not allowed with XHTML?

> Source/WebCore/xml/parser/XMLTreeBuilder.h:34
> +#include <wtf/text/TextPosition.h>

This doesn't appear to be needed here.
Comment 10 Vicki Pfau 2011-08-17 17:44:59 PDT
(In reply to comment #9)
> (From update of attachment 104229 [details])
> The ScriptElement handling looks good to me. It's very similar to the existing XML parsers.
> 
> View in context: https://bugs.webkit.org/attachment.cgi?id=104229&action=review
> 
> >> Source/WebCore/xml/parser/NewXMLDocumentParser.cpp:169
> >> +        return;
> > 
> > Why can't you tell the parser to finish while it's paused?
> 
> The script may document.write() into it. Or is that sort of thing not allowed with XHTML?

XML/XHTML doesn't support document.write()

> 
> > Source/WebCore/xml/parser/XMLTreeBuilder.h:34
> > +#include <wtf/text/TextPosition.h>
> 
> This doesn't appear to be needed here.

You appear to be correct.
Comment 11 Vicki Pfau 2011-08-18 16:32:46 PDT
Created attachment 104422 [details]
Patch
Comment 12 Adam Barth 2011-08-18 16:56:47 PDT
Comment on attachment 104422 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=104422&action=review

> Source/WebCore/xml/parser/NewXMLDocumentParser.cpp:145
> +    if (m_parserPaused)
> +        return;

I don't understand this part.  Why does finish() get an early exit when the parser is paused?  In the HTML parser, don't we queue the finish operation and enact it when the parser unpauses?

> Source/WebCore/xml/parser/XMLTreeBuilder.cpp:157
> +    if (scriptElement)
> +        m_parser->processScript(scriptElement);
> +
> +    popCurrentNode();

Is there a re-entrancy problem here?  In the HTML parser, we return from the tree builder before executing the script to avoid these re-entrancy problems.
Comment 13 Adam Barth 2011-08-18 16:58:11 PDT
Comment on attachment 104422 [details]
Patch

My sense is that this patch isn't quite perfect, but given that your internship is ending tomorrow, we should probably land this patch and iterate.
Comment 14 Vicki Pfau 2011-08-18 17:06:53 PDT
(In reply to comment #12)
> (From update of attachment 104422 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=104422&action=review
> 
> > Source/WebCore/xml/parser/NewXMLDocumentParser.cpp:145
> > +    if (m_parserPaused)
> > +        return;
> 
> I don't understand this part.  Why does finish() get an early exit when the parser is paused?  In the HTML parser, don't we queue the finish operation and enact it when the parser unpauses?

From what I can tell, finish is called when the parser drops out of the tokenizer loop, regardless of our state. I can take a closer look, but it might make more sense for someone to do that later...

> 
> > Source/WebCore/xml/parser/XMLTreeBuilder.cpp:157
> > +    if (scriptElement)
> > +        m_parser->processScript(scriptElement);
> > +
> > +    popCurrentNode();
> 
> Is there a re-entrancy problem here?  In the HTML parser, we return from the tree builder before executing the script to avoid these re-entrancy problems.

The script doesn't get run until we drop out of the tree builder--this just queues it up.
Comment 15 James Simonsen 2011-08-18 17:13:18 PDT
Comment on attachment 104422 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=104422&action=review

>>> Source/WebCore/xml/parser/XMLTreeBuilder.cpp:157
>>> +    popCurrentNode();
>> 
>> Is there a re-entrancy problem here?  In the HTML parser, we return from the tree builder before executing the script to avoid these re-entrancy problems.
> 
> The script doesn't get run until we drop out of the tree builder--this just queues it up.

It can execute immediately in processScript() if it's an inline script or it's an external script that's already cached.
Comment 16 Vicki Pfau 2011-08-18 17:20:39 PDT
(In reply to comment #15)
> (From update of attachment 104422 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=104422&action=review
> 
> >>> Source/WebCore/xml/parser/XMLTreeBuilder.cpp:157
> >>> +    popCurrentNode();
> >> 
> >> Is there a re-entrancy problem here?  In the HTML parser, we return from the tree builder before executing the script to avoid these re-entrancy problems.
> > 
> > The script doesn't get run until we drop out of the tree builder--this just queues it up.
> 
> It can execute immediately in processScript() if it's an inline script or it's an external script that's already cached.

You're correct, I forgot about that. Nothing I ran seemed to indicate reentrancy problems, but it's possible that I wasn't thorough enough.
Comment 17 Vicki Pfau 2011-08-18 17:29:46 PDT
Comment on attachment 104422 [details]
Patch

I'm going to land this now given it's not in live code and I've got some patches that interact with this patch; if necessary, I suppose it can be ripped out or replaced later, but I don't have time to do that myself.
Comment 18 WebKit Review Bot 2011-08-18 18:09:47 PDT
Comment on attachment 104422 [details]
Patch

Clearing flags on attachment: 104422

Committed r93380: <http://trac.webkit.org/changeset/93380>
Comment 19 WebKit Review Bot 2011-08-18 18:09:53 PDT
All reviewed patches have been landed.  Closing bug.