RESOLVED FIXED Bug 66406
New XML parser: scripting support
https://bugs.webkit.org/show_bug.cgi?id=66406
Summary New XML parser: scripting support
Vicki Pfau
Reported 2011-08-17 13:46:55 PDT
New XML parser: scripting support
Attachments
Patch (13.24 KB, patch)
2011-08-17 13:48 PDT, Vicki Pfau
no flags
Patch (10.89 KB, patch)
2011-08-18 16:32 PDT, Vicki Pfau
no flags
Vicki Pfau
Comment 1 2011-08-17 13:48:14 PDT
Sam Weinig
Comment 2 2011-08-17 14:08:12 PDT
Comment on attachment 104229 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=104229&action=review > Source/WebCore/ChangeLog:25 > + Reviewed by NOBODY (OOPS!). > + > + * xml/parser/NewXMLDocumentParser.cpp: > + (WebCore::NewXMLDocumentParser::NewXMLDocumentParser): > + (WebCore::NewXMLDocumentParser::resumeParsing): > + (WebCore::NewXMLDocumentParser::processScript): > + (WebCore::NewXMLDocumentParser::textPosition): > + (WebCore::NewXMLDocumentParser::lineNumber): > + (WebCore::NewXMLDocumentParser::append): > + (WebCore::NewXMLDocumentParser::finish): > + (WebCore::NewXMLDocumentParser::finishWasCalled): > + (WebCore::NewXMLDocumentParser::notifyFinished): > + * xml/parser/NewXMLDocumentParser.h: > + (WebCore::NewXMLDocumentParser::pauseParsing): > + (WebCore::NewXMLDocumentParser::setScriptStartPosition): > + * xml/parser/XMLTreeBuilder.cpp: > + (WebCore::XMLTreeBuilder::closeElement): > + (WebCore::XMLTreeBuilder::processStartTag): > + (WebCore::XMLTreeBuilder::processEndTag): > + * xml/parser/XMLTreeBuilder.h: Please include more details in your change log about the functions that are changing. > Source/WebCore/xml/parser/NewXMLDocumentParser.cpp:226 > + scriptElement->executeScript(sourceCode); Can executeScript cause scriptElement to be deallocated? Should scriptElement be in a RefPtr?
Sam Weinig
Comment 3 2011-08-17 14:08:39 PDT
Does this enable any interesting tests to run?
Vicki Pfau
Comment 4 2011-08-17 14:17:25 PDT
(In reply to comment #2) > (From update of attachment 104229 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=104229&action=review > > > Source/WebCore/xml/parser/NewXMLDocumentParser.cpp:226 > > + scriptElement->executeScript(sourceCode); > > Can executeScript cause scriptElement to be deallocated? Should scriptElement be in a RefPtr? scriptElement is not refcounted. It is extracted from the Element object (in either XMLTreeBuilder::StartTag or EndTag), which is in a RefPtr already. (In reply to comment #3) > Does this enable any interesting tests to run? A lot! But I don't have any super-specific examples.
Darin Adler
Comment 5 2011-08-17 14:23:16 PDT
Comment on attachment 104229 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=104229&action=review > Source/WebCore/xml/parser/NewXMLDocumentParser.cpp:45 > + , m_pendingScript(0) > + , m_scriptElement(0) No need to explicitly initialize a CachedResourceHandle or RefPtr. They get set to 0 by their default constructor. > Source/WebCore/xml/parser/NewXMLDocumentParser.cpp:59 > + , m_pendingScript(0) > + , m_scriptElement(0) Ditto.
Adam Barth
Comment 6 2011-08-17 16:17:48 PDT
+ some folks who might like to review this patch.
Adam Barth
Comment 7 2011-08-17 16:25:49 PDT
Comment on attachment 104229 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=104229&action=review > Source/WebCore/xml/parser/NewXMLDocumentParser.cpp:105 > + if (scriptElement->prepareScript(m_scriptStartPosition, ScriptElement::AllowLegacyTypeInTypeAttribute)) { > + if (scriptElement->readyToBeParserExecuted()) > + scriptElement->executeScript(ScriptSourceCode(scriptElement->scriptContent(), document()->url(), m_scriptStartPosition)); > + else if (scriptElement->willBeParserExecuted()) { > + m_pendingScript = scriptElement->cachedScript(); > + m_scriptElement = scriptElement->element(); > + m_pendingScript->addClient(this); > + > + // m_pendingScript will be 0 if script was already loaded and addClient() executed it. > + if (m_pendingScript) > + pauseParsing(); > + } else > + m_scriptElement = 0; > + } Is this code copy/pasted from the HTML parser? Can the code be shared instead? > Source/WebCore/xml/parser/NewXMLDocumentParser.cpp:119 > - return 0; > + return m_tokenizer->lineNumber() + 1; The TextPosition types are supposed to save us from this kind of +1 madness. There should be some way for the C++ type system to do this work for you. > Source/WebCore/xml/parser/NewXMLDocumentParser.cpp:131 > + // Do some bookkeeping to make sure that re-appending doesn't mess up the cursor What is re-appending? Why would we append the same string more than once? > Source/WebCore/xml/parser/NewXMLDocumentParser.cpp:169 > + if (m_parserPaused) > + return; Why can't you tell the parser to finish while it's paused? > Source/WebCore/xml/parser/NewXMLDocumentParser.cpp:185 > - return m_finishWasCalled; > + return m_parserPaused || m_finishWasCalled; This is clearly wrong. This function is just an accessor for m_finishWasCalled.
Vicki Pfau
Comment 8 2011-08-17 16:58:30 PDT
(In reply to comment #7) > (From update of attachment 104229 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=104229&action=review > > > Source/WebCore/xml/parser/NewXMLDocumentParser.cpp:105 > > + if (scriptElement->prepareScript(m_scriptStartPosition, ScriptElement::AllowLegacyTypeInTypeAttribute)) { > > + if (scriptElement->readyToBeParserExecuted()) > > + scriptElement->executeScript(ScriptSourceCode(scriptElement->scriptContent(), document()->url(), m_scriptStartPosition)); > > + else if (scriptElement->willBeParserExecuted()) { > > + m_pendingScript = scriptElement->cachedScript(); > > + m_scriptElement = scriptElement->element(); > > + m_pendingScript->addClient(this); > > + > > + // m_pendingScript will be 0 if script was already loaded and addClient() executed it. > > + if (m_pendingScript) > > + pauseParsing(); > > + } else > > + m_scriptElement = 0; > > + } > > Is this code copy/pasted from the HTML parser? Can the code be shared instead? It's copy-pasted from the other XML parsers, with a little modification. For what it's worth, the code is already copy-pasted between the Qt and the libxml2 ones, so it might make sense to refactor it all out somewhere. Not sure exactly where, though. > > > Source/WebCore/xml/parser/NewXMLDocumentParser.cpp:119 > > - return 0; > > + return m_tokenizer->lineNumber() + 1; > > The TextPosition types are supposed to save us from this kind of +1 madness. There should be some way for the C++ type system to do this work for you. > This function returns int in the superclass, and the tokenizer returns int too, just with a different base (apparently). It's also only line number, while TextPosition keeps track of column too. > > Source/WebCore/xml/parser/NewXMLDocumentParser.cpp:131 > > + // Do some bookkeeping to make sure that re-appending doesn't mess up the cursor > > What is re-appending? Why would we append the same string more than once? > Parsing a script can cause it to pause parsing and re-enter append() with the same SegmentedString (except with it being advanced in the process, so the line numbers change). > > Source/WebCore/xml/parser/NewXMLDocumentParser.cpp:169 > > + if (m_parserPaused) > > + return; > > Why can't you tell the parser to finish while it's paused? Because the parser isn't finished. If we tell it it's finished, it will just stop when we're done with that token. This function gets called when we drop out of the tokenizer, which we need to do when we're processing a remote script (otherwise said script never gets to load). If we do tell it to finish, it will never re-enter the tokenizer after it's done with that script. > > Source/WebCore/xml/parser/NewXMLDocumentParser.cpp:185 > > - return m_finishWasCalled; > > + return m_parserPaused || m_finishWasCalled; > > This is clearly wrong. This function is just an accessor for m_finishWasCalled. This doesn't appear to be called anywhere, anyway. Not sure why I changed it.
James Simonsen
Comment 9 2011-08-17 17:13:16 PDT
Comment on attachment 104229 [details] Patch The ScriptElement handling looks good to me. It's very similar to the existing XML parsers. View in context: https://bugs.webkit.org/attachment.cgi?id=104229&action=review >> Source/WebCore/xml/parser/NewXMLDocumentParser.cpp:169 >> + return; > > Why can't you tell the parser to finish while it's paused? The script may document.write() into it. Or is that sort of thing not allowed with XHTML? > Source/WebCore/xml/parser/XMLTreeBuilder.h:34 > +#include <wtf/text/TextPosition.h> This doesn't appear to be needed here.
Vicki Pfau
Comment 10 2011-08-17 17:44:59 PDT
(In reply to comment #9) > (From update of attachment 104229 [details]) > The ScriptElement handling looks good to me. It's very similar to the existing XML parsers. > > View in context: https://bugs.webkit.org/attachment.cgi?id=104229&action=review > > >> Source/WebCore/xml/parser/NewXMLDocumentParser.cpp:169 > >> + return; > > > > Why can't you tell the parser to finish while it's paused? > > The script may document.write() into it. Or is that sort of thing not allowed with XHTML? XML/XHTML doesn't support document.write() > > > Source/WebCore/xml/parser/XMLTreeBuilder.h:34 > > +#include <wtf/text/TextPosition.h> > > This doesn't appear to be needed here. You appear to be correct.
Vicki Pfau
Comment 11 2011-08-18 16:32:46 PDT
Adam Barth
Comment 12 2011-08-18 16:56:47 PDT
Comment on attachment 104422 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=104422&action=review > Source/WebCore/xml/parser/NewXMLDocumentParser.cpp:145 > + if (m_parserPaused) > + return; I don't understand this part. Why does finish() get an early exit when the parser is paused? In the HTML parser, don't we queue the finish operation and enact it when the parser unpauses? > Source/WebCore/xml/parser/XMLTreeBuilder.cpp:157 > + if (scriptElement) > + m_parser->processScript(scriptElement); > + > + popCurrentNode(); Is there a re-entrancy problem here? In the HTML parser, we return from the tree builder before executing the script to avoid these re-entrancy problems.
Adam Barth
Comment 13 2011-08-18 16:58:11 PDT
Comment on attachment 104422 [details] Patch My sense is that this patch isn't quite perfect, but given that your internship is ending tomorrow, we should probably land this patch and iterate.
Vicki Pfau
Comment 14 2011-08-18 17:06:53 PDT
(In reply to comment #12) > (From update of attachment 104422 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=104422&action=review > > > Source/WebCore/xml/parser/NewXMLDocumentParser.cpp:145 > > + if (m_parserPaused) > > + return; > > I don't understand this part. Why does finish() get an early exit when the parser is paused? In the HTML parser, don't we queue the finish operation and enact it when the parser unpauses? From what I can tell, finish is called when the parser drops out of the tokenizer loop, regardless of our state. I can take a closer look, but it might make more sense for someone to do that later... > > > Source/WebCore/xml/parser/XMLTreeBuilder.cpp:157 > > + if (scriptElement) > > + m_parser->processScript(scriptElement); > > + > > + popCurrentNode(); > > Is there a re-entrancy problem here? In the HTML parser, we return from the tree builder before executing the script to avoid these re-entrancy problems. The script doesn't get run until we drop out of the tree builder--this just queues it up.
James Simonsen
Comment 15 2011-08-18 17:13:18 PDT
Comment on attachment 104422 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=104422&action=review >>> Source/WebCore/xml/parser/XMLTreeBuilder.cpp:157 >>> + popCurrentNode(); >> >> Is there a re-entrancy problem here? In the HTML parser, we return from the tree builder before executing the script to avoid these re-entrancy problems. > > The script doesn't get run until we drop out of the tree builder--this just queues it up. It can execute immediately in processScript() if it's an inline script or it's an external script that's already cached.
Vicki Pfau
Comment 16 2011-08-18 17:20:39 PDT
(In reply to comment #15) > (From update of attachment 104422 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=104422&action=review > > >>> Source/WebCore/xml/parser/XMLTreeBuilder.cpp:157 > >>> + popCurrentNode(); > >> > >> Is there a re-entrancy problem here? In the HTML parser, we return from the tree builder before executing the script to avoid these re-entrancy problems. > > > > The script doesn't get run until we drop out of the tree builder--this just queues it up. > > It can execute immediately in processScript() if it's an inline script or it's an external script that's already cached. You're correct, I forgot about that. Nothing I ran seemed to indicate reentrancy problems, but it's possible that I wasn't thorough enough.
Vicki Pfau
Comment 17 2011-08-18 17:29:46 PDT
Comment on attachment 104422 [details] Patch I'm going to land this now given it's not in live code and I've got some patches that interact with this patch; if necessary, I suppose it can be ripped out or replaced later, but I don't have time to do that myself.
WebKit Review Bot
Comment 18 2011-08-18 18:09:47 PDT
Comment on attachment 104422 [details] Patch Clearing flags on attachment: 104422 Committed r93380: <http://trac.webkit.org/changeset/93380>
WebKit Review Bot
Comment 19 2011-08-18 18:09:53 PDT
All reviewed patches have been landed. Closing bug.
Note You need to log in before you can comment on or make changes to this bug.