RESOLVED FIXED 64764
New Token class for XML
https://bugs.webkit.org/show_bug.cgi?id=64764
Summary New Token class for XML
Vicki Pfau
Reported 2011-07-18 16:41:28 PDT
For us to have a new XML tokenizer, we first need a well-designed token class.
Attachments
Patch (81.45 KB, patch)
2011-07-18 16:50 PDT, Vicki Pfau
no flags
Patch (83.79 KB, patch)
2011-07-18 18:06 PDT, Vicki Pfau
no flags
Patch (83.83 KB, patch)
2011-07-18 18:54 PDT, Vicki Pfau
no flags
Patch (84.11 KB, patch)
2011-07-20 11:28 PDT, Vicki Pfau
no flags
Patch (84.26 KB, patch)
2011-07-20 11:34 PDT, Vicki Pfau
no flags
Patch for landing (84.20 KB, patch)
2011-07-20 12:51 PDT, Vicki Pfau
no flags
Vicki Pfau
Comment 1 2011-07-18 16:50:09 PDT
WebKit Review Bot
Comment 2 2011-07-18 16:54:05 PDT
Comment on attachment 101234 [details] Patch Attachment 101234 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/9117412
Gyuyoung Kim
Comment 3 2011-07-18 16:55:16 PDT
WebKit Review Bot
Comment 4 2011-07-18 17:01:25 PDT
Early Warning System Bot
Comment 5 2011-07-18 17:04:59 PDT
Vicki Pfau
Comment 6 2011-07-18 18:06:34 PDT
Gyuyoung Kim
Comment 7 2011-07-18 18:12:44 PDT
WebKit Review Bot
Comment 8 2011-07-18 18:26:57 PDT
Early Warning System Bot
Comment 9 2011-07-18 18:27:31 PDT
WebKit Review Bot
Comment 10 2011-07-18 18:33:43 PDT
Comment on attachment 101251 [details] Patch Attachment 101251 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/9110613
Vicki Pfau
Comment 11 2011-07-18 18:54:34 PDT
Adam Barth
Comment 12 2011-07-19 12:28:29 PDT
Comment on attachment 101257 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=101257&action=review This looks great. I'd like to take one more look to see how you address the nits below. > Source/WebCore/html/parser/HTMLToken.h:45 > + class DoctypeData { This doesn't inherit from DoctypeDataBase ? It looks the same, plus a m_forceQuirks flag. > Source/WebCore/html/parser/HTMLToken.h:51 > + : m_hasPublicIdentifier(false) > + , m_hasSystemIdentifier(false) > + , m_forceQuirks(false) This should be indented four more spaces. > Source/WebCore/html/parser/HTMLToken.h:63 > +class HTMLToken : public TokenBase<HTMLTokenTypes, HTMLTokenTypes::DoctypeData> { I wonder if TokenBase is too general a name. For example, there are also CSS tokens in WebCore. Make MarkupTokenBase ? > Source/WebCore/html/parser/HTMLToken.h:65 > + HTMLToken() : TokenBase<HTMLTokenTypes, HTMLTokenTypes::DoctypeData>() { } Won't the compiler generate this constructor for us? > Source/WebCore/html/parser/HTMLToken.h:70 > + doAppendToName(character); I would have expected this to be TokenBase::appendToName rather than having a "do" prefix. > Source/WebCore/xml/parser/TokenBase.h:36 > +class BaseDoctypeData { BaseDoctypeData => DoctypeDataBase ? > Source/WebCore/xml/parser/XMLToken.h:178 > + void beginEntity() > + { > + ASSERT(m_type == XMLTokenTypes::Uninitialized); > + m_type = XMLTokenTypes::Entity; > + } So you're going to generate tokens for the entities. The HTML tokenizer takes care of expanding them internally, but maybe this approach works better for XML? > Source/WebCore/xml/parser/XMLToken.h:204 > + void printString(const DataVector& string) > + { > + DataVector::const_iterator iter = string.begin(); > + for (; iter != string.end(); ++iter) > + printf("%lc", wchar_t(*iter)); > + } > + > + void printAttrs() > + { > + AttributeList::const_iterator iter = m_attributes.begin(); > + for (; iter != m_attributes.end(); ++iter) { > + printf(" "); > + printString(iter->m_name); > + printf("=\""); > + printString(iter->m_value); > + printf("\""); > + } > + } These seem like they should go on the base class. Maybe guard with #ifndef NDEBUG ? > Source/WebCore/xml/parser/XMLToken.h:206 > + void print() Also probably should be only built in debug. > Source/WebCore/xml/parser/XMLToken.h:335 > + : m_hasStandalone(false) > + , m_hasEncoding(false) > + , m_standalone(false) four more indents.
Adam Barth
Comment 13 2011-07-19 12:29:19 PDT
Comment on attachment 101257 [details] Patch Actually, all those comments are trivial, so I don't need to see this again. Please address them before landing.
Vicki Pfau
Comment 14 2011-07-19 18:41:49 PDT
(In reply to comment #12) > (From update of attachment 101257 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=101257&action=review > > This looks great. I'd like to take one more look to see how you address the nits below. > > > Source/WebCore/html/parser/HTMLToken.h:45 > > + class DoctypeData { > > This doesn't inherit from DoctypeDataBase ? It looks the same, plus a m_forceQuirks flag. This seems to have been an oversight. It was meant to inherit. > > Source/WebCore/html/parser/HTMLToken.h:65 > > + HTMLToken() : TokenBase<HTMLTokenTypes, HTMLTokenTypes::DoctypeData>() { } > > Won't the compiler generate this constructor for us? I was getting linker errors before, but now I don't appear to be... > > Source/WebCore/html/parser/HTMLToken.h:70 > > + doAppendToName(character); > > I would have expected this to be TokenBase::appendToName rather than having a "do" prefix. This is really to avoid naming confusion, but I can change this so that XMLToken::apependToName calls non-virtual TokenBase::appendToName (also name() does the same thing). > > Source/WebCore/xml/parser/XMLToken.h:178 > > + void beginEntity() > > + { > > + ASSERT(m_type == XMLTokenTypes::Uninitialized); > > + m_type = XMLTokenTypes::Entity; > > + } > > So you're going to generate tokens for the entities. The HTML tokenizer takes care of expanding them internally, but maybe this approach works better for XML? I'm not sure what the comment here is supposed to convey. You can't expand entities in XML during the tokenization phase unless they're character references or &amp;, &gt;, &lt;, $apos; or $quot;. All others have to be passed to the parser.
Adam Barth
Comment 15 2011-07-19 19:03:27 PDT
(In reply to comment #14) > (In reply to comment #12) > > (From update of attachment 101257 [details] [details]) > > View in context: https://bugs.webkit.org/attachment.cgi?id=101257&action=review > > > > > Source/WebCore/html/parser/HTMLToken.h:70 > > > + doAppendToName(character); > > > > I would have expected this to be TokenBase::appendToName rather than having a "do" prefix. > > This is really to avoid naming confusion, but I can change this so that XMLToken::apependToName calls non-virtual TokenBase::appendToName (also name() does the same thing). I wouldn't worry about name confusion. You're just overriding the method in the base class. > > > Source/WebCore/xml/parser/XMLToken.h:178 > > > + void beginEntity() > > > + { > > > + ASSERT(m_type == XMLTokenTypes::Uninitialized); > > > + m_type = XMLTokenTypes::Entity; > > > + } > > > > So you're going to generate tokens for the entities. The HTML tokenizer takes care of expanding them internally, but maybe this approach works better for XML? > > I'm not sure what the comment here is supposed to convey. You can't expand entities in XML during the tokenization phase unless they're character references or &amp;, &gt;, &lt;, $apos; or $quot;. All others have to be passed to the parser. The comment is mostly conveying my ignorance about how XML parsing works. :)
Vicki Pfau
Comment 16 2011-07-20 11:28:21 PDT
Vicki Pfau
Comment 17 2011-07-20 11:34:39 PDT
Adam Barth
Comment 18 2011-07-20 12:46:50 PDT
Comment on attachment 101488 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=101488&action=review > Source/WebCore/html/parser/HTMLToken.h:49 > + : DoctypeDataBase() The compiler will generate this line for you.
Vicki Pfau
Comment 19 2011-07-20 12:51:12 PDT
Created attachment 101497 [details] Patch for landing
WebKit Review Bot
Comment 20 2011-07-20 13:35:12 PDT
Comment on attachment 101497 [details] Patch for landing Clearing flags on attachment: 101497 Committed r91396: <http://trac.webkit.org/changeset/91396>
WebKit Review Bot
Comment 21 2011-07-20 13:35:19 PDT
All reviewed patches have been landed. Closing bug.
Daniel Bates
Comment 23 2011-07-20 15:16:23 PDT
(In reply to comment #22) > This patch broke GTK builds: > [...] Committed build fix in changeset 91409 <http://trac.webkit.org/changeset/91409>.
Note You need to log in before you can comment on or make changes to this bug.