WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
64764
New Token class for XML
https://bugs.webkit.org/show_bug.cgi?id=64764
Summary
New Token class for XML
Vicki Pfau
Reported
2011-07-18 16:41:28 PDT
For us to have a new XML tokenizer, we first need a well-designed token class.
Attachments
Patch
(81.45 KB, patch)
2011-07-18 16:50 PDT
,
Vicki Pfau
no flags
Details
Formatted Diff
Diff
Patch
(83.79 KB, patch)
2011-07-18 18:06 PDT
,
Vicki Pfau
no flags
Details
Formatted Diff
Diff
Patch
(83.83 KB, patch)
2011-07-18 18:54 PDT
,
Vicki Pfau
no flags
Details
Formatted Diff
Diff
Patch
(84.11 KB, patch)
2011-07-20 11:28 PDT
,
Vicki Pfau
no flags
Details
Formatted Diff
Diff
Patch
(84.26 KB, patch)
2011-07-20 11:34 PDT
,
Vicki Pfau
no flags
Details
Formatted Diff
Diff
Patch for landing
(84.20 KB, patch)
2011-07-20 12:51 PDT
,
Vicki Pfau
no flags
Details
Formatted Diff
Diff
Show Obsolete
(5)
View All
Add attachment
proposed patch, testcase, etc.
Vicki Pfau
Comment 1
2011-07-18 16:50:09 PDT
Created
attachment 101234
[details]
Patch
WebKit Review Bot
Comment 2
2011-07-18 16:54:05 PDT
Comment on
attachment 101234
[details]
Patch
Attachment 101234
[details]
did not pass chromium-ews (chromium-xvfb): Output:
http://queues.webkit.org/results/9117412
Gyuyoung Kim
Comment 3
2011-07-18 16:55:16 PDT
Comment on
attachment 101234
[details]
Patch
Attachment 101234
[details]
did not pass efl-ews (efl): Output:
http://queues.webkit.org/results/9115453
WebKit Review Bot
Comment 4
2011-07-18 17:01:25 PDT
Comment on
attachment 101234
[details]
Patch
Attachment 101234
[details]
did not pass mac-ews (mac): Output:
http://queues.webkit.org/results/9106009
Early Warning System Bot
Comment 5
2011-07-18 17:04:59 PDT
Comment on
attachment 101234
[details]
Patch
Attachment 101234
[details]
did not pass qt-ews (qt): Output:
http://queues.webkit.org/results/9109009
Vicki Pfau
Comment 6
2011-07-18 18:06:34 PDT
Created
attachment 101251
[details]
Patch
Gyuyoung Kim
Comment 7
2011-07-18 18:12:44 PDT
Comment on
attachment 101251
[details]
Patch
Attachment 101251
[details]
did not pass efl-ews (efl): Output:
http://queues.webkit.org/results/9112502
WebKit Review Bot
Comment 8
2011-07-18 18:26:57 PDT
Comment on
attachment 101251
[details]
Patch
Attachment 101251
[details]
did not pass mac-ews (mac): Output:
http://queues.webkit.org/results/9117434
Early Warning System Bot
Comment 9
2011-07-18 18:27:31 PDT
Comment on
attachment 101251
[details]
Patch
Attachment 101251
[details]
did not pass qt-ews (qt): Output:
http://queues.webkit.org/results/9117438
WebKit Review Bot
Comment 10
2011-07-18 18:33:43 PDT
Comment on
attachment 101251
[details]
Patch
Attachment 101251
[details]
did not pass chromium-ews (chromium-xvfb): Output:
http://queues.webkit.org/results/9110613
Vicki Pfau
Comment 11
2011-07-18 18:54:34 PDT
Created
attachment 101257
[details]
Patch
Adam Barth
Comment 12
2011-07-19 12:28:29 PDT
Comment on
attachment 101257
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=101257&action=review
This looks great. I'd like to take one more look to see how you address the nits below.
> Source/WebCore/html/parser/HTMLToken.h:45 > + class DoctypeData {
This doesn't inherit from DoctypeDataBase ? It looks the same, plus a m_forceQuirks flag.
> Source/WebCore/html/parser/HTMLToken.h:51 > + : m_hasPublicIdentifier(false) > + , m_hasSystemIdentifier(false) > + , m_forceQuirks(false)
This should be indented four more spaces.
> Source/WebCore/html/parser/HTMLToken.h:63 > +class HTMLToken : public TokenBase<HTMLTokenTypes, HTMLTokenTypes::DoctypeData> {
I wonder if TokenBase is too general a name. For example, there are also CSS tokens in WebCore. Make MarkupTokenBase ?
> Source/WebCore/html/parser/HTMLToken.h:65 > + HTMLToken() : TokenBase<HTMLTokenTypes, HTMLTokenTypes::DoctypeData>() { }
Won't the compiler generate this constructor for us?
> Source/WebCore/html/parser/HTMLToken.h:70 > + doAppendToName(character);
I would have expected this to be TokenBase::appendToName rather than having a "do" prefix.
> Source/WebCore/xml/parser/TokenBase.h:36 > +class BaseDoctypeData {
BaseDoctypeData => DoctypeDataBase ?
> Source/WebCore/xml/parser/XMLToken.h:178 > + void beginEntity() > + { > + ASSERT(m_type == XMLTokenTypes::Uninitialized); > + m_type = XMLTokenTypes::Entity; > + }
So you're going to generate tokens for the entities. The HTML tokenizer takes care of expanding them internally, but maybe this approach works better for XML?
> Source/WebCore/xml/parser/XMLToken.h:204 > + void printString(const DataVector& string) > + { > + DataVector::const_iterator iter = string.begin(); > + for (; iter != string.end(); ++iter) > + printf("%lc", wchar_t(*iter)); > + } > + > + void printAttrs() > + { > + AttributeList::const_iterator iter = m_attributes.begin(); > + for (; iter != m_attributes.end(); ++iter) { > + printf(" "); > + printString(iter->m_name); > + printf("=\""); > + printString(iter->m_value); > + printf("\""); > + } > + }
These seem like they should go on the base class. Maybe guard with #ifndef NDEBUG ?
> Source/WebCore/xml/parser/XMLToken.h:206 > + void print()
Also probably should be only built in debug.
> Source/WebCore/xml/parser/XMLToken.h:335 > + : m_hasStandalone(false) > + , m_hasEncoding(false) > + , m_standalone(false)
four more indents.
Adam Barth
Comment 13
2011-07-19 12:29:19 PDT
Comment on
attachment 101257
[details]
Patch Actually, all those comments are trivial, so I don't need to see this again. Please address them before landing.
Vicki Pfau
Comment 14
2011-07-19 18:41:49 PDT
(In reply to
comment #12
)
> (From update of
attachment 101257
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=101257&action=review
> > This looks great. I'd like to take one more look to see how you address the nits below. > > > Source/WebCore/html/parser/HTMLToken.h:45 > > + class DoctypeData { > > This doesn't inherit from DoctypeDataBase ? It looks the same, plus a m_forceQuirks flag.
This seems to have been an oversight. It was meant to inherit.
> > Source/WebCore/html/parser/HTMLToken.h:65 > > + HTMLToken() : TokenBase<HTMLTokenTypes, HTMLTokenTypes::DoctypeData>() { } > > Won't the compiler generate this constructor for us?
I was getting linker errors before, but now I don't appear to be...
> > Source/WebCore/html/parser/HTMLToken.h:70 > > + doAppendToName(character); > > I would have expected this to be TokenBase::appendToName rather than having a "do" prefix.
This is really to avoid naming confusion, but I can change this so that XMLToken::apependToName calls non-virtual TokenBase::appendToName (also name() does the same thing).
> > Source/WebCore/xml/parser/XMLToken.h:178 > > + void beginEntity() > > + { > > + ASSERT(m_type == XMLTokenTypes::Uninitialized); > > + m_type = XMLTokenTypes::Entity; > > + } > > So you're going to generate tokens for the entities. The HTML tokenizer takes care of expanding them internally, but maybe this approach works better for XML?
I'm not sure what the comment here is supposed to convey. You can't expand entities in XML during the tokenization phase unless they're character references or &, >, <, $apos; or $quot;. All others have to be passed to the parser.
Adam Barth
Comment 15
2011-07-19 19:03:27 PDT
(In reply to
comment #14
)
> (In reply to
comment #12
) > > (From update of
attachment 101257
[details]
[details]) > > View in context:
https://bugs.webkit.org/attachment.cgi?id=101257&action=review
> > > > > Source/WebCore/html/parser/HTMLToken.h:70 > > > + doAppendToName(character); > > > > I would have expected this to be TokenBase::appendToName rather than having a "do" prefix. > > This is really to avoid naming confusion, but I can change this so that XMLToken::apependToName calls non-virtual TokenBase::appendToName (also name() does the same thing).
I wouldn't worry about name confusion. You're just overriding the method in the base class.
> > > Source/WebCore/xml/parser/XMLToken.h:178 > > > + void beginEntity() > > > + { > > > + ASSERT(m_type == XMLTokenTypes::Uninitialized); > > > + m_type = XMLTokenTypes::Entity; > > > + } > > > > So you're going to generate tokens for the entities. The HTML tokenizer takes care of expanding them internally, but maybe this approach works better for XML? > > I'm not sure what the comment here is supposed to convey. You can't expand entities in XML during the tokenization phase unless they're character references or &, >, <, $apos; or $quot;. All others have to be passed to the parser.
The comment is mostly conveying my ignorance about how XML parsing works. :)
Vicki Pfau
Comment 16
2011-07-20 11:28:21 PDT
Created
attachment 101487
[details]
Patch
Vicki Pfau
Comment 17
2011-07-20 11:34:39 PDT
Created
attachment 101488
[details]
Patch
Adam Barth
Comment 18
2011-07-20 12:46:50 PDT
Comment on
attachment 101488
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=101488&action=review
> Source/WebCore/html/parser/HTMLToken.h:49 > + : DoctypeDataBase()
The compiler will generate this line for you.
Vicki Pfau
Comment 19
2011-07-20 12:51:12 PDT
Created
attachment 101497
[details]
Patch for landing
WebKit Review Bot
Comment 20
2011-07-20 13:35:12 PDT
Comment on
attachment 101497
[details]
Patch for landing Clearing flags on attachment: 101497 Committed
r91396
: <
http://trac.webkit.org/changeset/91396
>
WebKit Review Bot
Comment 21
2011-07-20 13:35:19 PDT
All reviewed patches have been landed. Closing bug.
Ryosuke Niwa
Comment 22
2011-07-20 14:42:15 PDT
This patch broke GTK builds:
http://build.webkit.org/builders/GTK%20Linux%2032-bit%20Debug/builds/16454
http://build.webkit.org/builders/GTK%20Linux%2064-bit%20Debug/builds/24433
Daniel Bates
Comment 23
2011-07-20 15:16:23 PDT
(In reply to
comment #22
)
> This patch broke GTK builds: > [...]
Committed build fix in changeset 91409 <
http://trac.webkit.org/changeset/91409
>.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug