Bug 64764 - New Token class for XML
Summary: New Token class for XML
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: XML (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Vicki Pfau
URL:
Keywords:
Depends on:
Blocks: 64566
  Show dependency treegraph
 
Reported: 2011-07-18 16:41 PDT by Vicki Pfau
Modified: 2011-07-20 15:16 PDT (History)
5 users (show)

See Also:


Attachments
Patch (81.45 KB, patch)
2011-07-18 16:50 PDT, Vicki Pfau
no flags Details | Formatted Diff | Diff
Patch (83.79 KB, patch)
2011-07-18 18:06 PDT, Vicki Pfau
no flags Details | Formatted Diff | Diff
Patch (83.83 KB, patch)
2011-07-18 18:54 PDT, Vicki Pfau
no flags Details | Formatted Diff | Diff
Patch (84.11 KB, patch)
2011-07-20 11:28 PDT, Vicki Pfau
no flags Details | Formatted Diff | Diff
Patch (84.26 KB, patch)
2011-07-20 11:34 PDT, Vicki Pfau
no flags Details | Formatted Diff | Diff
Patch for landing (84.20 KB, patch)
2011-07-20 12:51 PDT, Vicki Pfau
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Vicki Pfau 2011-07-18 16:41:28 PDT
For us to have a new XML tokenizer, we first need a well-designed token class.
Comment 1 Vicki Pfau 2011-07-18 16:50:09 PDT
Created attachment 101234 [details]
Patch
Comment 2 WebKit Review Bot 2011-07-18 16:54:05 PDT
Comment on attachment 101234 [details]
Patch

Attachment 101234 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/9117412
Comment 3 Gyuyoung Kim 2011-07-18 16:55:16 PDT
Comment on attachment 101234 [details]
Patch

Attachment 101234 [details] did not pass efl-ews (efl):
Output: http://queues.webkit.org/results/9115453
Comment 4 WebKit Review Bot 2011-07-18 17:01:25 PDT
Comment on attachment 101234 [details]
Patch

Attachment 101234 [details] did not pass mac-ews (mac):
Output: http://queues.webkit.org/results/9106009
Comment 5 Early Warning System Bot 2011-07-18 17:04:59 PDT
Comment on attachment 101234 [details]
Patch

Attachment 101234 [details] did not pass qt-ews (qt):
Output: http://queues.webkit.org/results/9109009
Comment 6 Vicki Pfau 2011-07-18 18:06:34 PDT
Created attachment 101251 [details]
Patch
Comment 7 Gyuyoung Kim 2011-07-18 18:12:44 PDT
Comment on attachment 101251 [details]
Patch

Attachment 101251 [details] did not pass efl-ews (efl):
Output: http://queues.webkit.org/results/9112502
Comment 8 WebKit Review Bot 2011-07-18 18:26:57 PDT
Comment on attachment 101251 [details]
Patch

Attachment 101251 [details] did not pass mac-ews (mac):
Output: http://queues.webkit.org/results/9117434
Comment 9 Early Warning System Bot 2011-07-18 18:27:31 PDT
Comment on attachment 101251 [details]
Patch

Attachment 101251 [details] did not pass qt-ews (qt):
Output: http://queues.webkit.org/results/9117438
Comment 10 WebKit Review Bot 2011-07-18 18:33:43 PDT
Comment on attachment 101251 [details]
Patch

Attachment 101251 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/9110613
Comment 11 Vicki Pfau 2011-07-18 18:54:34 PDT
Created attachment 101257 [details]
Patch
Comment 12 Adam Barth 2011-07-19 12:28:29 PDT
Comment on attachment 101257 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=101257&action=review

This looks great.  I'd like to take one more look to see how you address the nits below.

> Source/WebCore/html/parser/HTMLToken.h:45
> +    class DoctypeData {

This doesn't inherit from DoctypeDataBase ?  It looks the same, plus a m_forceQuirks flag.

> Source/WebCore/html/parser/HTMLToken.h:51
> +        : m_hasPublicIdentifier(false)
> +        , m_hasSystemIdentifier(false)
> +        , m_forceQuirks(false)

This should be indented four more spaces.

> Source/WebCore/html/parser/HTMLToken.h:63
> +class HTMLToken : public TokenBase<HTMLTokenTypes, HTMLTokenTypes::DoctypeData> {

I wonder if TokenBase is too general a name.  For example, there are also CSS tokens in WebCore.  Make MarkupTokenBase ?

> Source/WebCore/html/parser/HTMLToken.h:65
> +    HTMLToken() : TokenBase<HTMLTokenTypes, HTMLTokenTypes::DoctypeData>() { }

Won't the compiler generate this constructor for us?

> Source/WebCore/html/parser/HTMLToken.h:70
> +        doAppendToName(character);

I would have expected this to be TokenBase::appendToName rather than having a "do" prefix.

> Source/WebCore/xml/parser/TokenBase.h:36
> +class BaseDoctypeData {

BaseDoctypeData => DoctypeDataBase ?

> Source/WebCore/xml/parser/XMLToken.h:178
> +    void beginEntity()
> +    {
> +        ASSERT(m_type == XMLTokenTypes::Uninitialized);
> +        m_type = XMLTokenTypes::Entity;
> +    }

So you're going to generate tokens for the entities.  The HTML tokenizer takes care of expanding them internally, but maybe this approach works better for XML?

> Source/WebCore/xml/parser/XMLToken.h:204
> +    void printString(const DataVector& string)
> +    {
> +        DataVector::const_iterator iter = string.begin();
> +        for (; iter != string.end(); ++iter)
> +            printf("%lc", wchar_t(*iter));
> +    }
> +
> +    void printAttrs()
> +    {
> +        AttributeList::const_iterator iter = m_attributes.begin();
> +        for (; iter != m_attributes.end(); ++iter) {
> +            printf(" ");
> +            printString(iter->m_name);
> +            printf("=\"");
> +            printString(iter->m_value);
> +            printf("\"");
> +        }
> +    }

These seem like they should go on the base class.  Maybe guard with #ifndef NDEBUG ?

> Source/WebCore/xml/parser/XMLToken.h:206
> +    void print()

Also probably should be only built in debug.

> Source/WebCore/xml/parser/XMLToken.h:335
> +        : m_hasStandalone(false)
> +        , m_hasEncoding(false)
> +        , m_standalone(false)

four more indents.
Comment 13 Adam Barth 2011-07-19 12:29:19 PDT
Comment on attachment 101257 [details]
Patch

Actually, all those comments are trivial, so I don't need to see this again.  Please address them before landing.
Comment 14 Vicki Pfau 2011-07-19 18:41:49 PDT
(In reply to comment #12)
> (From update of attachment 101257 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=101257&action=review
> 
> This looks great.  I'd like to take one more look to see how you address the nits below.
> 
> > Source/WebCore/html/parser/HTMLToken.h:45
> > +    class DoctypeData {
> 
> This doesn't inherit from DoctypeDataBase ?  It looks the same, plus a m_forceQuirks flag.

This seems to have been an oversight. It was meant to inherit.

> > Source/WebCore/html/parser/HTMLToken.h:65
> > +    HTMLToken() : TokenBase<HTMLTokenTypes, HTMLTokenTypes::DoctypeData>() { }
> 
> Won't the compiler generate this constructor for us?

I was getting linker errors before, but now I don't appear to be...

> > Source/WebCore/html/parser/HTMLToken.h:70
> > +        doAppendToName(character);
> 
> I would have expected this to be TokenBase::appendToName rather than having a "do" prefix.

This is really to avoid naming confusion, but I can change this so that XMLToken::apependToName calls non-virtual TokenBase::appendToName (also name() does the same thing).

> > Source/WebCore/xml/parser/XMLToken.h:178
> > +    void beginEntity()
> > +    {
> > +        ASSERT(m_type == XMLTokenTypes::Uninitialized);
> > +        m_type = XMLTokenTypes::Entity;
> > +    }
> 
> So you're going to generate tokens for the entities.  The HTML tokenizer takes care of expanding them internally, but maybe this approach works better for XML?

I'm not sure what the comment here is supposed to convey. You can't expand entities in XML during the tokenization phase unless they're character references or &amp;, &gt;, &lt;, $apos; or $quot;. All others have to be passed to the parser.
Comment 15 Adam Barth 2011-07-19 19:03:27 PDT
(In reply to comment #14)
> (In reply to comment #12)
> > (From update of attachment 101257 [details] [details])
> > View in context: https://bugs.webkit.org/attachment.cgi?id=101257&action=review
> > 
> > > Source/WebCore/html/parser/HTMLToken.h:70
> > > +        doAppendToName(character);
> > 
> > I would have expected this to be TokenBase::appendToName rather than having a "do" prefix.
> 
> This is really to avoid naming confusion, but I can change this so that XMLToken::apependToName calls non-virtual TokenBase::appendToName (also name() does the same thing).

I wouldn't worry about name confusion.  You're just overriding the method in the base class.

> > > Source/WebCore/xml/parser/XMLToken.h:178
> > > +    void beginEntity()
> > > +    {
> > > +        ASSERT(m_type == XMLTokenTypes::Uninitialized);
> > > +        m_type = XMLTokenTypes::Entity;
> > > +    }
> > 
> > So you're going to generate tokens for the entities.  The HTML tokenizer takes care of expanding them internally, but maybe this approach works better for XML?
> 
> I'm not sure what the comment here is supposed to convey. You can't expand entities in XML during the tokenization phase unless they're character references or &amp;, &gt;, &lt;, $apos; or $quot;. All others have to be passed to the parser.

The comment is mostly conveying my ignorance about how XML parsing works.  :)
Comment 16 Vicki Pfau 2011-07-20 11:28:21 PDT
Created attachment 101487 [details]
Patch
Comment 17 Vicki Pfau 2011-07-20 11:34:39 PDT
Created attachment 101488 [details]
Patch
Comment 18 Adam Barth 2011-07-20 12:46:50 PDT
Comment on attachment 101488 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=101488&action=review

> Source/WebCore/html/parser/HTMLToken.h:49
> +            : DoctypeDataBase()

The compiler will generate this line for you.
Comment 19 Vicki Pfau 2011-07-20 12:51:12 PDT
Created attachment 101497 [details]
Patch for landing
Comment 20 WebKit Review Bot 2011-07-20 13:35:12 PDT
Comment on attachment 101497 [details]
Patch for landing

Clearing flags on attachment: 101497

Committed r91396: <http://trac.webkit.org/changeset/91396>
Comment 21 WebKit Review Bot 2011-07-20 13:35:19 PDT
All reviewed patches have been landed.  Closing bug.
Comment 23 Daniel Bates 2011-07-20 15:16:23 PDT
(In reply to comment #22)
> This patch broke GTK builds:
> [...]

Committed build fix in changeset 91409 <http://trac.webkit.org/changeset/91409>.