64764 – New Token class for XML

RESOLVED FIXED 64764

New Token class for XML

https://bugs.webkit.org/show_bug.cgi?id=64764

Summary New Token class for XML

Vicki Pfau

Reported 2011-07-18 16:41:28 PDT

For us to have a new XML tokenizer, we first need a well-designed token class.

Attachments
Patch (81.45 KB, patch) 2011-07-18 16:50 PDT, Vicki Pfau	no flags	Details Formatted Diff Diff
Patch (83.79 KB, patch) 2011-07-18 18:06 PDT, Vicki Pfau	no flags	Details Formatted Diff Diff
Patch (83.83 KB, patch) 2011-07-18 18:54 PDT, Vicki Pfau	no flags	Details Formatted Diff Diff
Patch (84.11 KB, patch) 2011-07-20 11:28 PDT, Vicki Pfau	no flags	Details Formatted Diff Diff
Patch (84.26 KB, patch) 2011-07-20 11:34 PDT, Vicki Pfau	no flags	Details Formatted Diff Diff
Patch for landing (84.20 KB, patch) 2011-07-20 12:51 PDT, Vicki Pfau	no flags	Details Formatted Diff Diff
Show Obsolete (5) View All Add attachment proposed patch, testcase, etc.

Vicki Pfau

Comment 1 2011-07-18 16:50:09 PDT

Created attachment 101234 [details] Patch

WebKit Review Bot

Comment 2 2011-07-18 16:54:05 PDT

Comment on attachment 101234 [details] Patch Attachment 101234 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/9117412

Gyuyoung Kim

Comment 3 2011-07-18 16:55:16 PDT

Comment on attachment 101234 [details] Patch Attachment 101234 [details] did not pass efl-ews (efl): Output: http://queues.webkit.org/results/9115453

WebKit Review Bot

Comment 4 2011-07-18 17:01:25 PDT

Comment on attachment 101234 [details] Patch Attachment 101234 [details] did not pass mac-ews (mac): Output: http://queues.webkit.org/results/9106009

Early Warning System Bot

Comment 5 2011-07-18 17:04:59 PDT

Comment on attachment 101234 [details] Patch Attachment 101234 [details] did not pass qt-ews (qt): Output: http://queues.webkit.org/results/9109009

Vicki Pfau

Comment 6 2011-07-18 18:06:34 PDT

Created attachment 101251 [details] Patch

Gyuyoung Kim

Comment 7 2011-07-18 18:12:44 PDT

Comment on attachment 101251 [details] Patch Attachment 101251 [details] did not pass efl-ews (efl): Output: http://queues.webkit.org/results/9112502

WebKit Review Bot

Comment 8 2011-07-18 18:26:57 PDT

Comment on attachment 101251 [details] Patch Attachment 101251 [details] did not pass mac-ews (mac): Output: http://queues.webkit.org/results/9117434

Early Warning System Bot

Comment 9 2011-07-18 18:27:31 PDT

Comment on attachment 101251 [details] Patch Attachment 101251 [details] did not pass qt-ews (qt): Output: http://queues.webkit.org/results/9117438

WebKit Review Bot

Comment 10 2011-07-18 18:33:43 PDT

Comment on attachment 101251 [details] Patch Attachment 101251 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/9110613

Vicki Pfau

Comment 11 2011-07-18 18:54:34 PDT

Created attachment 101257 [details] Patch

Adam Barth

Comment 12 2011-07-19 12:28:29 PDT

Comment on attachment 101257 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=101257&action=review This looks great. I'd like to take one more look to see how you address the nits below. > Source/WebCore/html/parser/HTMLToken.h:45 > + class DoctypeData { This doesn't inherit from DoctypeDataBase ? It looks the same, plus a m_forceQuirks flag. > Source/WebCore/html/parser/HTMLToken.h:51 > + : m_hasPublicIdentifier(false) > + , m_hasSystemIdentifier(false) > + , m_forceQuirks(false) This should be indented four more spaces. > Source/WebCore/html/parser/HTMLToken.h:63 > +class HTMLToken : public TokenBase<HTMLTokenTypes, HTMLTokenTypes::DoctypeData> { I wonder if TokenBase is too general a name. For example, there are also CSS tokens in WebCore. Make MarkupTokenBase ? > Source/WebCore/html/parser/HTMLToken.h:65 > + HTMLToken() : TokenBase<HTMLTokenTypes, HTMLTokenTypes::DoctypeData>() { } Won't the compiler generate this constructor for us? > Source/WebCore/html/parser/HTMLToken.h:70 > + doAppendToName(character); I would have expected this to be TokenBase::appendToName rather than having a "do" prefix. > Source/WebCore/xml/parser/TokenBase.h:36 > +class BaseDoctypeData { BaseDoctypeData => DoctypeDataBase ? > Source/WebCore/xml/parser/XMLToken.h:178 > + void beginEntity() > + { > + ASSERT(m_type == XMLTokenTypes::Uninitialized); > + m_type = XMLTokenTypes::Entity; > + } So you're going to generate tokens for the entities. The HTML tokenizer takes care of expanding them internally, but maybe this approach works better for XML? > Source/WebCore/xml/parser/XMLToken.h:204 > + void printString(const DataVector& string) > + { > + DataVector::const_iterator iter = string.begin(); > + for (; iter != string.end(); ++iter) > + printf("%lc", wchar_t(*iter)); > + } > + > + void printAttrs() > + { > + AttributeList::const_iterator iter = m_attributes.begin(); > + for (; iter != m_attributes.end(); ++iter) { > + printf(" "); > + printString(iter->m_name); > + printf("=\""); > + printString(iter->m_value); > + printf("\""); > + } > + } These seem like they should go on the base class. Maybe guard with #ifndef NDEBUG ? > Source/WebCore/xml/parser/XMLToken.h:206 > + void print() Also probably should be only built in debug. > Source/WebCore/xml/parser/XMLToken.h:335 > + : m_hasStandalone(false) > + , m_hasEncoding(false) > + , m_standalone(false) four more indents.

Adam Barth

Comment 13 2011-07-19 12:29:19 PDT

Comment on attachment 101257 [details] Patch Actually, all those comments are trivial, so I don't need to see this again. Please address them before landing.

Vicki Pfau

Comment 14 2011-07-19 18:41:49 PDT

(In reply to comment #12) > (From update of attachment 101257 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=101257&action=review > > This looks great. I'd like to take one more look to see how you address the nits below. > > > Source/WebCore/html/parser/HTMLToken.h:45 > > + class DoctypeData { > > This doesn't inherit from DoctypeDataBase ? It looks the same, plus a m_forceQuirks flag. This seems to have been an oversight. It was meant to inherit. > > Source/WebCore/html/parser/HTMLToken.h:65 > > + HTMLToken() : TokenBase<HTMLTokenTypes, HTMLTokenTypes::DoctypeData>() { } > > Won't the compiler generate this constructor for us? I was getting linker errors before, but now I don't appear to be... > > Source/WebCore/html/parser/HTMLToken.h:70 > > + doAppendToName(character); > > I would have expected this to be TokenBase::appendToName rather than having a "do" prefix. This is really to avoid naming confusion, but I can change this so that XMLToken::apependToName calls non-virtual TokenBase::appendToName (also name() does the same thing). > > Source/WebCore/xml/parser/XMLToken.h:178 > > + void beginEntity() > > + { > > + ASSERT(m_type == XMLTokenTypes::Uninitialized); > > + m_type = XMLTokenTypes::Entity; > > + } > > So you're going to generate tokens for the entities. The HTML tokenizer takes care of expanding them internally, but maybe this approach works better for XML? I'm not sure what the comment here is supposed to convey. You can't expand entities in XML during the tokenization phase unless they're character references or &, >, <, $apos; or $quot;. All others have to be passed to the parser.

Adam Barth

Comment 15 2011-07-19 19:03:27 PDT

(In reply to comment #14) > (In reply to comment #12) > > (From update of attachment 101257 [details] [details]) > > View in context: https://bugs.webkit.org/attachment.cgi?id=101257&action=review > > > > > Source/WebCore/html/parser/HTMLToken.h:70 > > > + doAppendToName(character); > > > > I would have expected this to be TokenBase::appendToName rather than having a "do" prefix. > > This is really to avoid naming confusion, but I can change this so that XMLToken::apependToName calls non-virtual TokenBase::appendToName (also name() does the same thing). I wouldn't worry about name confusion. You're just overriding the method in the base class. > > > Source/WebCore/xml/parser/XMLToken.h:178 > > > + void beginEntity() > > > + { > > > + ASSERT(m_type == XMLTokenTypes::Uninitialized); > > > + m_type = XMLTokenTypes::Entity; > > > + } > > > > So you're going to generate tokens for the entities. The HTML tokenizer takes care of expanding them internally, but maybe this approach works better for XML? > > I'm not sure what the comment here is supposed to convey. You can't expand entities in XML during the tokenization phase unless they're character references or &, >, <, $apos; or $quot;. All others have to be passed to the parser. The comment is mostly conveying my ignorance about how XML parsing works. :)

Vicki Pfau

Comment 16 2011-07-20 11:28:21 PDT

Created attachment 101487 [details] Patch

Vicki Pfau

Comment 17 2011-07-20 11:34:39 PDT

Created attachment 101488 [details] Patch

Adam Barth

Comment 18 2011-07-20 12:46:50 PDT

Comment on attachment 101488 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=101488&action=review > Source/WebCore/html/parser/HTMLToken.h:49 > + : DoctypeDataBase() The compiler will generate this line for you.

Vicki Pfau

Comment 19 2011-07-20 12:51:12 PDT

Created attachment 101497 [details] Patch for landing

WebKit Review Bot

Comment 20 2011-07-20 13:35:12 PDT

Comment on attachment 101497 [details] Patch for landing Clearing flags on attachment: 101497 Committed r91396: <http://trac.webkit.org/changeset/91396>

WebKit Review Bot

Comment 21 2011-07-20 13:35:19 PDT

All reviewed patches have been landed. Closing bug.

Ryosuke Niwa

Comment 22 2011-07-20 14:42:15 PDT

This patch broke GTK builds: http://build.webkit.org/builders/GTK%20Linux%2032-bit%20Debug/builds/16454 http://build.webkit.org/builders/GTK%20Linux%2064-bit%20Debug/builds/24433

Daniel Bates

Comment 23 2011-07-20 15:16:23 PDT

(In reply to comment #22) > This patch broke GTK builds: > [...] Committed build fix in changeset 91409 <http://trac.webkit.org/changeset/91409>.

Note You need to log in before you can comment on or make changes to this bug.

Status RESOLVED

Resolution FIXED

Priority P2

Severity Normal

Classification Unclassified

Version 528+ (Nightly build)

Hardware Unspecified

OS Unspecified

Product WebKit

Component XML

Assignee

Vicki Pfau

Reported

2011-07-18 16:41 PDT

Modified

2011-07-20 15:16 PDT History

CC List

5 users Show

URL

Keywords

Depends on

Blocks

64566

Dependencies

tree graph