For us to have a new XML tokenizer, we first need a well-designed token class.
Created attachment 101234 [details] Patch
Comment on attachment 101234 [details] Patch Attachment 101234 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/9117412
Comment on attachment 101234 [details] Patch Attachment 101234 [details] did not pass efl-ews (efl): Output: http://queues.webkit.org/results/9115453
Comment on attachment 101234 [details] Patch Attachment 101234 [details] did not pass mac-ews (mac): Output: http://queues.webkit.org/results/9106009
Comment on attachment 101234 [details] Patch Attachment 101234 [details] did not pass qt-ews (qt): Output: http://queues.webkit.org/results/9109009
Created attachment 101251 [details] Patch
Comment on attachment 101251 [details] Patch Attachment 101251 [details] did not pass efl-ews (efl): Output: http://queues.webkit.org/results/9112502
Comment on attachment 101251 [details] Patch Attachment 101251 [details] did not pass mac-ews (mac): Output: http://queues.webkit.org/results/9117434
Comment on attachment 101251 [details] Patch Attachment 101251 [details] did not pass qt-ews (qt): Output: http://queues.webkit.org/results/9117438
Comment on attachment 101251 [details] Patch Attachment 101251 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/9110613
Created attachment 101257 [details] Patch
Comment on attachment 101257 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=101257&action=review This looks great. I'd like to take one more look to see how you address the nits below. > Source/WebCore/html/parser/HTMLToken.h:45 > + class DoctypeData { This doesn't inherit from DoctypeDataBase ? It looks the same, plus a m_forceQuirks flag. > Source/WebCore/html/parser/HTMLToken.h:51 > + : m_hasPublicIdentifier(false) > + , m_hasSystemIdentifier(false) > + , m_forceQuirks(false) This should be indented four more spaces. > Source/WebCore/html/parser/HTMLToken.h:63 > +class HTMLToken : public TokenBase<HTMLTokenTypes, HTMLTokenTypes::DoctypeData> { I wonder if TokenBase is too general a name. For example, there are also CSS tokens in WebCore. Make MarkupTokenBase ? > Source/WebCore/html/parser/HTMLToken.h:65 > + HTMLToken() : TokenBase<HTMLTokenTypes, HTMLTokenTypes::DoctypeData>() { } Won't the compiler generate this constructor for us? > Source/WebCore/html/parser/HTMLToken.h:70 > + doAppendToName(character); I would have expected this to be TokenBase::appendToName rather than having a "do" prefix. > Source/WebCore/xml/parser/TokenBase.h:36 > +class BaseDoctypeData { BaseDoctypeData => DoctypeDataBase ? > Source/WebCore/xml/parser/XMLToken.h:178 > + void beginEntity() > + { > + ASSERT(m_type == XMLTokenTypes::Uninitialized); > + m_type = XMLTokenTypes::Entity; > + } So you're going to generate tokens for the entities. The HTML tokenizer takes care of expanding them internally, but maybe this approach works better for XML? > Source/WebCore/xml/parser/XMLToken.h:204 > + void printString(const DataVector& string) > + { > + DataVector::const_iterator iter = string.begin(); > + for (; iter != string.end(); ++iter) > + printf("%lc", wchar_t(*iter)); > + } > + > + void printAttrs() > + { > + AttributeList::const_iterator iter = m_attributes.begin(); > + for (; iter != m_attributes.end(); ++iter) { > + printf(" "); > + printString(iter->m_name); > + printf("=\""); > + printString(iter->m_value); > + printf("\""); > + } > + } These seem like they should go on the base class. Maybe guard with #ifndef NDEBUG ? > Source/WebCore/xml/parser/XMLToken.h:206 > + void print() Also probably should be only built in debug. > Source/WebCore/xml/parser/XMLToken.h:335 > + : m_hasStandalone(false) > + , m_hasEncoding(false) > + , m_standalone(false) four more indents.
Comment on attachment 101257 [details] Patch Actually, all those comments are trivial, so I don't need to see this again. Please address them before landing.
(In reply to comment #12) > (From update of attachment 101257 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=101257&action=review > > This looks great. I'd like to take one more look to see how you address the nits below. > > > Source/WebCore/html/parser/HTMLToken.h:45 > > + class DoctypeData { > > This doesn't inherit from DoctypeDataBase ? It looks the same, plus a m_forceQuirks flag. This seems to have been an oversight. It was meant to inherit. > > Source/WebCore/html/parser/HTMLToken.h:65 > > + HTMLToken() : TokenBase<HTMLTokenTypes, HTMLTokenTypes::DoctypeData>() { } > > Won't the compiler generate this constructor for us? I was getting linker errors before, but now I don't appear to be... > > Source/WebCore/html/parser/HTMLToken.h:70 > > + doAppendToName(character); > > I would have expected this to be TokenBase::appendToName rather than having a "do" prefix. This is really to avoid naming confusion, but I can change this so that XMLToken::apependToName calls non-virtual TokenBase::appendToName (also name() does the same thing). > > Source/WebCore/xml/parser/XMLToken.h:178 > > + void beginEntity() > > + { > > + ASSERT(m_type == XMLTokenTypes::Uninitialized); > > + m_type = XMLTokenTypes::Entity; > > + } > > So you're going to generate tokens for the entities. The HTML tokenizer takes care of expanding them internally, but maybe this approach works better for XML? I'm not sure what the comment here is supposed to convey. You can't expand entities in XML during the tokenization phase unless they're character references or &, >, <, $apos; or $quot;. All others have to be passed to the parser.
(In reply to comment #14) > (In reply to comment #12) > > (From update of attachment 101257 [details] [details]) > > View in context: https://bugs.webkit.org/attachment.cgi?id=101257&action=review > > > > > Source/WebCore/html/parser/HTMLToken.h:70 > > > + doAppendToName(character); > > > > I would have expected this to be TokenBase::appendToName rather than having a "do" prefix. > > This is really to avoid naming confusion, but I can change this so that XMLToken::apependToName calls non-virtual TokenBase::appendToName (also name() does the same thing). I wouldn't worry about name confusion. You're just overriding the method in the base class. > > > Source/WebCore/xml/parser/XMLToken.h:178 > > > + void beginEntity() > > > + { > > > + ASSERT(m_type == XMLTokenTypes::Uninitialized); > > > + m_type = XMLTokenTypes::Entity; > > > + } > > > > So you're going to generate tokens for the entities. The HTML tokenizer takes care of expanding them internally, but maybe this approach works better for XML? > > I'm not sure what the comment here is supposed to convey. You can't expand entities in XML during the tokenization phase unless they're character references or &, >, <, $apos; or $quot;. All others have to be passed to the parser. The comment is mostly conveying my ignorance about how XML parsing works. :)
Created attachment 101487 [details] Patch
Created attachment 101488 [details] Patch
Comment on attachment 101488 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=101488&action=review > Source/WebCore/html/parser/HTMLToken.h:49 > + : DoctypeDataBase() The compiler will generate this line for you.
Created attachment 101497 [details] Patch for landing
Comment on attachment 101497 [details] Patch for landing Clearing flags on attachment: 101497 Committed r91396: <http://trac.webkit.org/changeset/91396>
All reviewed patches have been landed. Closing bug.
This patch broke GTK builds: http://build.webkit.org/builders/GTK%20Linux%2032-bit%20Debug/builds/16454 http://build.webkit.org/builders/GTK%20Linux%2064-bit%20Debug/builds/24433
(In reply to comment #22) > This patch broke GTK builds: > [...] Committed build fix in changeset 91409 <http://trac.webkit.org/changeset/91409>.