Summary: | Add a new baseclass for XML, HTML and Text DocumentParsers to clean up DocumentParser call sites | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Eric Seidel (no email) <eric> | ||||||||||||||
Component: | New Bugs | Assignee: | Eric Seidel (no email) <eric> | ||||||||||||||
Status: | RESOLVED FIXED | ||||||||||||||||
Severity: | Normal | CC: | abarth, commit-queue, darin, dbates, eric, tonyg, webkit.review.bot | ||||||||||||||
Priority: | P2 | ||||||||||||||||
Version: | 528+ (Nightly build) | ||||||||||||||||
Hardware: | Other | ||||||||||||||||
OS: | OS X 10.5 | ||||||||||||||||
Bug Depends on: | 41136 | ||||||||||||||||
Bug Blocks: | |||||||||||||||||
Attachments: |
|
Description
Eric Seidel (no email)
2010-06-24 02:46:14 PDT
Created attachment 59626 [details]
Patch
Comment on attachment 59626 [details]
Patch
I view this patch as an imperfect step in the right direction.
It's actually probably ready for review except for missing build system modifications and possible need of name change.
ScriptableDocumentParser was the best I could come up with. But there is probably better.
It may make sense to make TextDocumentParser a direct subclass of DocumentParser and then make ScriptableDocumentParser a subclass of TextDocumentParser. It's unclear. Right now TextDocumentParser is "scriptable" so that it can have "write()" instead of writeRawData.
There are really two divisions here. One for what sort of data do you want (raw vs. decoded) which might be solved best by moving the decoding into the base DocumentParser class. and a second of whether you as a DocumentParser support all the scripting stuff that was previously on the DocumentParser base interface.
One of the nice side-effects of this patch is now nearly every method on ScriptableDocumentParser is abstract since we don't need default implementations to support the RawDataDocumentParsers anymore!
Maybe a "TextDataDocumentParser" intermediary base class to handle write() and wellFormed() would be best. It's also unclear if I should move synchronousScriptWrite() down onto the base DocumentParser class in preparation for one day solving https://bugs.webkit.org/show_bug.cgi?id=25397. I think in any non-textdata document parser synchronousScriptWrite() just ends up calling open, so maybe that's what the DocumentWriter should do on its behalf. Created attachment 59627 [details]
Ready for review
Comment on attachment 59627 [details]
Ready for review
Some initial comments. I'll review this more this afternoon.
WebCore/dom/Document.cpp:1723
+ || (scriptableDocumentParser() && scriptableDocumentParser()->isExecutingScript()))
This calls the virtual function twice for no reason. You should save the result in a local.
WebCore/dom/Document.cpp:1762
+ if (m_frame && scriptableDocumentParser())
Same bad pattern.
WebCore/dom/Document.cpp:2004
+ // FIXME: JS code can always call document.write, all DocumentParsers may
I don't think this comment is correct. document.write should only exist for HTML documents.
WebCore/dom/ScriptableDocumentParser.cpp:38
+ void ScriptableDocumentParser::synchronousScriptWrite(const SegmentedString& source)
Why not "insert" ?
WebCore/dom/ScriptableDocumentParser.cpp:43
+ void ScriptableDocumentParser::appendNetworkData(const SegmentedString& source)
Why not "append" ?
Comment on attachment 59627 [details]
Ready for review
As we discussed in person, the patch has too much RTTI. I think that's because the interface is being drawn at the wrong place.
Created attachment 59844 [details]
Patch
Comment on attachment 59844 [details] Patch I'm not sure about the name "scriptable document parser". What does "scriptable" mean here? I think it's a bug that scripts don't work in image documents, for example, so we probably should not be enshrining that bug in our design or class names. If that's what scriptable means. > - if (DocumentParser* parser = m_frame->document()->parser()) > + ScriptableDocumentParser* parser = m_frame->document()->scriptableDocumentParser(); > + if (parser) > return parser->lineNumber() + 1; Why did you take the definition of the variable out of the if statement? If you're going to do that, then I suggest changing the style to early return. Or you could leave the style the way it was. > + // FIXME: This is a hack for HTMLFormControlElement::removedFromTree > virtual LegacyHTMLTreeBuilder* htmlTreeBuilder() const { return 0; } I find this comment confusing. Is there some specific sense in which having a way to downcast the tree builder is a hack? I understand that RTTI is not elegant, but hack seems a strong word here. > + ScriptableDocumentParser(Document* document, bool viewSourceMode = false); The argument name "document" could be omitted here. This constructor should be protected. Can any of the other public members be protected? > + // Only used by Document::open for deciding if its safe to act on a > + // javscript document.open() call right now, or it should be ignored. > + virtual bool isExecutingScript() const { return false; } I suggest spelling it "JavaScript". Generally speaking I don't think functions should have comments listing who calls them. Is there some other way you can say this? > +}; > + There's an extra semicolon here after the namespace ends. > +// FIXME: Why is this different from SVGDocumentExtentions parserLineNumber? Spelling error: SVGDocumentExtensions. > +static int parserLineNumber(Document* document) > { > - DocumentParser* parser = document->parser(); > + if (document && document->scriptableDocumentParser()) > + return document->scriptableDocumentParser()->lineNumber() + 1; > + return 0; > +} It's not good to call a function that does RTTI like scriptableDocumentParser twice in a row. Is there a better way to write this? Local variable and early return perhaps? > +// FIXME: TextDocumentParser could just be an HTMLDocumentParser > +// which started the Tokenizer in the PlainText state. Really? Once you're in the PlainText state you can't get out? Seems OK. r=me despite the concerns above (In reply to comment #8) > (From update of attachment 59844 [details]) > I'm not sure about the name "scriptable document parser". What does "scriptable" mean here? I think it's a bug that scripts don't work in image documents, for example, so we probably should not be enshrining that bug in our design or class names. If that's what scriptable means. Agreed. I'm not exactly sure how you would run a script in an image document however. The user typing a javascript: url? > > - if (DocumentParser* parser = m_frame->document()->parser()) > > + ScriptableDocumentParser* parser = m_frame->document()->scriptableDocumentParser(); > > + if (parser) > > return parser->lineNumber() + 1; > > Why did you take the definition of the variable out of the if statement? If you're going to do that, then I suggest changing the style to early return. Or you could leave the style the way it was. I did it to make the line shorter. Your'e right that early return would be better, will fix. > > + // FIXME: This is a hack for HTMLFormControlElement::removedFromTree > > virtual LegacyHTMLTreeBuilder* htmlTreeBuilder() const { return 0; } > > I find this comment confusing. Is there some specific sense in which having a way to downcast the tree builder is a hack? I understand that RTTI is not elegant, but hack seems a strong word here. Oh, it's not just a specific downcast this is doing here. It's grabbing at the treebuilder which is otherwise considered an "implementation detail" of the parser. Even worse, it's expecting back a LegacyHTMLTreeBuilder, which once Adam and I finish the new tree builder won't even be possible. In order to finish the new tree builder we'll need to fix/remove this hack. > > + ScriptableDocumentParser(Document* document, bool viewSourceMode = false); > > The argument name "document" could be omitted here. This constructor should be protected. Can any of the other public members be protected? Agreed! I'll look for more members to make private/protected. That was part of the point of all these changes, I'm sad I overlooked a chance to do more API restriction here. Thanks for the catch. > > + // Only used by Document::open for deciding if its safe to act on a > > + // javscript document.open() call right now, or it should be ignored. > > + virtual bool isExecutingScript() const { return false; } > > I suggest spelling it "JavaScript". Generally speaking I don't think functions should have comments listing who calls them. Is there some other way you can say this? I agree, those comments have a high chance of bitrot. They were an attempt to document the previously sprawling API of the DocumentParser. The comments are intended to help someone else (like myself or you) understand the existing backwards interaction between the DocumntParser and the rest of WebCore in hopes of fixing it. > > +}; > > + > > There's an extra semicolon here after the namespace ends. I can never remember which definitions have semicolons after them and which don't. (Classes do, namespaces don't I guess). > > +// FIXME: Why is this different from SVGDocumentExtentions parserLineNumber? > > Spelling error: SVGDocumentExtensions. > > > +static int parserLineNumber(Document* document) > > { > > - DocumentParser* parser = document->parser(); > > + if (document && document->scriptableDocumentParser()) > > + return document->scriptableDocumentParser()->lineNumber() + 1; > > + return 0; > > +} > > It's not good to call a function that does RTTI like scriptableDocumentParser twice in a row. Is there a better way to write this? Local variable and early return perhaps? Woh, that's just wrong. Will fix. > > +// FIXME: TextDocumentParser could just be an HTMLDocumentParser > > +// which started the Tokenizer in the PlainText state. > > Really? Once you're in the PlainText state you can't get out? Yup! The plaintext state is amazing. Throw a <plaintext> tag anywhere in your document and the rest will get eaten. :) Try it in the next blog comment you write... > Seems OK. r=me despite the concerns above Thanks. Will post a new patch with corrections. I'm not super-proud of this patch. It's an attempt to bring more sanity to teh parser system, but I may be over-engineering the problem. Really I just don't understand enough of the parser/loader interactions yet to make the perfect parser API, so this is an attempt at an incremental step towards tomorrow. http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#plaintext-state Note how there is no way to leave the state. Even a </plaintext> tag would be emitted as character tokens from the tokenizer. :) One advantage to just setting the HTMLTokenizer in the PlainText state and using that for the TextDocumentParser is that then we'd get all the \n\r handling for free, as well as any possible scripting needs. It wouldn't automatically get yielding, since yielding is done per-token currently. But we could add some yielding at the character level and then it would yield. Created attachment 59845 [details]
Patch for landing
Comment on attachment 59845 [details]
Patch for landing
Rejecting patch 59845 from commit-queue.
Unexpected failure when processing patch! Please file a bug against webkit-patch.
Failed to run "['./WebKitTools/Scripts/webkit-patch', '--status-host=webkit-commit-queue.appspot.com', 'land-attachment', '--force-clean', '--build', '--non-interactive', '--ignore-builders', '--build-style=both', '--quiet', 59845, '--test', '--parent-command=commit-queue', '--no-update']" exit_code: 1
Last 500 characters of output:
date_status
return NetworkTransaction().run(lambda: self._post_status_to_server(queue_name, status, patch, results_file))
File "/Users/eseidel/Projects/CommitQueue/WebKitTools/Scripts/webkitpy/common/net/networktransaction.py", line 56, in run
self._check_for_timeout()
File "/Users/eseidel/Projects/CommitQueue/WebKitTools/Scripts/webkitpy/common/net/networktransaction.py", line 63, in _check_for_timeout
raise NetworkTimeout()
webkitpy.common.net.networktransaction.NetworkTimeout
(In reply to comment #9) > (In reply to comment #8) > > I'm not sure about the name "scriptable document parser". What does "scriptable" mean here? I think it's a bug that scripts don't work in image documents, for example, so we probably should not be enshrining that bug in our design or class names. If that's what scriptable means. > > Agreed. I'm not exactly sure how you would run a script in an image document however. The user typing a javascript: url? A JavaScript URL is one way, or someone could insert additional elements using JavaScript running in another frame with the same origin. Those elements could contain attributes with script in them. There may be other ways. Created attachment 59873 [details]
Patch for landing
Comment on attachment 59873 [details]
Patch for landing
Rejecting patch 59873 from commit-queue.
Unexpected failure when processing patch! Please file a bug against webkit-patch.
Failed to run "['./WebKitTools/Scripts/webkit-patch', '--status-host=webkit-commit-queue.appspot.com', 'land-attachment', '--force-clean', '--build', '--non-interactive', '--ignore-builders', '--build-style=both', '--quiet', 59873, '--test', '--parent-command=commit-queue', '--no-update']" exit_code: 1
Last 500 characters of output:
date_status
return NetworkTransaction().run(lambda: self._post_status_to_server(queue_name, status, patch, results_file))
File "/Users/eseidel/Projects/CommitQueue/WebKitTools/Scripts/webkitpy/common/net/networktransaction.py", line 56, in run
self._check_for_timeout()
File "/Users/eseidel/Projects/CommitQueue/WebKitTools/Scripts/webkitpy/common/net/networktransaction.py", line 63, in _check_for_timeout
raise NetworkTimeout()
webkitpy.common.net.networktransaction.NetworkTimeout
Created attachment 59878 [details]
Patch for landing
Comment on attachment 59878 [details] Patch for landing Clearing flags on attachment: 59878 Committed r61985: <http://trac.webkit.org/changeset/61985> All reviewed patches have been landed. Closing bug. http://trac.webkit.org/changeset/61985 might have broken Chromium Linux Release Committed r61987: <http://trac.webkit.org/changeset/61987> |