For the new XML parser, we'll need a tokenizer for the source before we can parse the tokens.
Created attachment 101943 [details] Patch
Attachment 101943 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCor..." exit_code: 1 Source/WebCore/xml/parser/XMLTokenizer.cpp:227: Non-label code inside switch statements should be indented. [whitespace/indent] [4] Total errors found: 1 in 11 files If any of these errors are false positives, please file a bug against check-webkit-style.
Comment on attachment 101943 [details] Patch Attachment 101943 [details] did not pass cr-mac-ews (chromium): Output: http://queues.webkit.org/results/9253101
Comment on attachment 101943 [details] Patch Attachment 101943 [details] did not pass mac-ews (mac): Output: http://queues.webkit.org/results/9253107
Comment on attachment 101943 [details] Patch Attachment 101943 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/9250169
Created attachment 101957 [details] Patch
Comment on attachment 101943 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=101943&action=review This looks like a great start. Obviously, it would be nice for it to build before landing. :) > Source/WebCore/xml/parser/NewXMLDocumentParser.cpp:62 > + m_token.print(); DEBUG only? This function will obviously evolve. :) > Source/WebCore/xml/parser/XMLTokenizer.cpp:47 > +inline bool isNumeric(UChar cc) cc => c (I need to go through the HTML parser and fix this there too.) > Source/WebCore/xml/parser/XMLTokenizer.cpp:55 > +inline bool isLetter(UChar cc) > +{ > + return (cc >= 'A' && cc <= 'Z') || (cc >= 'a' && cc <= 'z'); > +} I think these functions already exist in ASCIITypes.h > Source/WebCore/xml/parser/XMLTokenizer.cpp:62 > +inline bool isValidNameStart(UChar cc) You might find that this function is hot, but optimizations can come later. > Source/WebCore/xml/parser/XMLTokenizer.cpp:472 > + DEFINE_STATIC_LOCAL(String, xmlString, ("xml ")); Is the space necessary after the xml? There can't be other sorts of whitespace? > Source/WebCore/xml/parser/XMLTokenizer.h:110 > + bool gotError() const { return m_parseError; } I'm not in love with gotError as a name. Maybe errorDuringParsing ? In any case, you should rename the member variable to match the accessor. > Source/WebCore/xml/parser/XMLTokenizer.h:112 > + virtual bool shouldSkipNullCharacters() const { return false; } I thought this wasn't virtual anymore?
Comment on attachment 101957 [details] Patch (modulo comments above)
Attachment 101957 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCor..." exit_code: 1 Source/WebCore/xml/parser/XMLTokenizer.cpp:227: Non-label code inside switch statements should be indented. [whitespace/indent] [4] Total errors found: 1 in 11 files If any of these errors are false positives, please file a bug against check-webkit-style.
(In reply to comment #7) > (From update of attachment 101943 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=101943&action=review > > This looks like a great start. Obviously, it would be nice for it to build before landing. :) That's what I get for only checking to make sure it builds in Debug! > > Source/WebCore/xml/parser/XMLTokenizer.cpp:55 > > +inline bool isLetter(UChar cc) > > +{ > > + return (cc >= 'A' && cc <= 'Z') || (cc >= 'a' && cc <= 'z'); > > +} > > I think these functions already exist in ASCIITypes.h I see an ASCIICTypes.h, which uses unsigned short and not UChar. Close enough? > > Source/WebCore/xml/parser/XMLTokenizer.cpp:472 > > + DEFINE_STATIC_LOCAL(String, xmlString, ("xml ")); > > Is the space necessary after the xml? There can't be other sorts of whitespace? Good catch. This can be any whitespace accepted by the tokenizer as whitespace. I'll fix that. > > Source/WebCore/xml/parser/XMLTokenizer.h:112 > > + virtual bool shouldSkipNullCharacters() const { return false; } > > I thought this wasn't virtual anymore? Whoops, this snuck in under my radar. I implemented the tokenizer before I re-optimized the merged token class, and forgot to remove this.
Comment on attachment 101957 [details] Patch Attachment 101957 [details] did not pass gtk-ews (gtk): Output: http://queues.webkit.org/results/9249240
Created attachment 102024 [details] Patch
Attachment 102024 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCor..." exit_code: 1 Source/WebCore/xml/parser/XMLTokenizer.cpp:212: Non-label code inside switch statements should be indented. [whitespace/indent] [4] Total errors found: 1 in 11 files If any of these errors are false positives, please file a bug against check-webkit-style.
Comment on attachment 102024 [details] Patch I realize now I forgot to fix one of the problems mentioned by abarth's previous comments
Created attachment 102074 [details] Patch for landing
Attachment 102074 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCor..." exit_code: 1 Source/WebCore/xml/parser/XMLTokenizer.cpp:212: Non-label code inside switch statements should be indented. [whitespace/indent] [4] Source/WebCore/xml/parser/XMLTokenizer.cpp:490: An else should appear on the same line as the preceding } [whitespace/newline] [4] Total errors found: 2 in 11 files If any of these errors are false positives, please file a bug against check-webkit-style.
Comment on attachment 102074 [details] Patch for landing Clearing flags on attachment: 102074 Committed r91811: <http://trac.webkit.org/changeset/91811>
All reviewed patches have been landed. Closing bug.