RESOLVED FIXED 64566
New XML tokenizer
https://bugs.webkit.org/show_bug.cgi?id=64566
Summary New XML tokenizer
Vicki Pfau
Reported 2011-07-14 15:31:29 PDT
For the new XML parser, we'll need a tokenizer for the source before we can parse the tokens.
Attachments
Patch (52.30 KB, patch)
2011-07-25 17:04 PDT, Vicki Pfau
no flags
Patch (52.33 KB, patch)
2011-07-25 18:19 PDT, Vicki Pfau
no flags
Patch (53.52 KB, patch)
2011-07-26 10:59 PDT, Vicki Pfau
no flags
Patch for landing (54.23 KB, patch)
2011-07-26 16:54 PDT, Vicki Pfau
no flags
Vicki Pfau
Comment 1 2011-07-25 17:04:43 PDT
WebKit Review Bot
Comment 2 2011-07-25 17:07:01 PDT
Attachment 101943 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCor..." exit_code: 1 Source/WebCore/xml/parser/XMLTokenizer.cpp:227: Non-label code inside switch statements should be indented. [whitespace/indent] [4] Total errors found: 1 in 11 files If any of these errors are false positives, please file a bug against check-webkit-style.
WebKit Review Bot
Comment 3 2011-07-25 17:55:07 PDT
Comment on attachment 101943 [details] Patch Attachment 101943 [details] did not pass cr-mac-ews (chromium): Output: http://queues.webkit.org/results/9253101
WebKit Review Bot
Comment 4 2011-07-25 18:09:23 PDT
WebKit Review Bot
Comment 5 2011-07-25 18:14:03 PDT
Comment on attachment 101943 [details] Patch Attachment 101943 [details] did not pass chromium-ews (chromium-xvfb): Output: http://queues.webkit.org/results/9250169
Vicki Pfau
Comment 6 2011-07-25 18:19:39 PDT
Adam Barth
Comment 7 2011-07-25 18:21:01 PDT
Comment on attachment 101943 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=101943&action=review This looks like a great start. Obviously, it would be nice for it to build before landing. :) > Source/WebCore/xml/parser/NewXMLDocumentParser.cpp:62 > + m_token.print(); DEBUG only? This function will obviously evolve. :) > Source/WebCore/xml/parser/XMLTokenizer.cpp:47 > +inline bool isNumeric(UChar cc) cc => c (I need to go through the HTML parser and fix this there too.) > Source/WebCore/xml/parser/XMLTokenizer.cpp:55 > +inline bool isLetter(UChar cc) > +{ > + return (cc >= 'A' && cc <= 'Z') || (cc >= 'a' && cc <= 'z'); > +} I think these functions already exist in ASCIITypes.h > Source/WebCore/xml/parser/XMLTokenizer.cpp:62 > +inline bool isValidNameStart(UChar cc) You might find that this function is hot, but optimizations can come later. > Source/WebCore/xml/parser/XMLTokenizer.cpp:472 > + DEFINE_STATIC_LOCAL(String, xmlString, ("xml ")); Is the space necessary after the xml? There can't be other sorts of whitespace? > Source/WebCore/xml/parser/XMLTokenizer.h:110 > + bool gotError() const { return m_parseError; } I'm not in love with gotError as a name. Maybe errorDuringParsing ? In any case, you should rename the member variable to match the accessor. > Source/WebCore/xml/parser/XMLTokenizer.h:112 > + virtual bool shouldSkipNullCharacters() const { return false; } I thought this wasn't virtual anymore?
Adam Barth
Comment 8 2011-07-25 18:21:28 PDT
Comment on attachment 101957 [details] Patch (modulo comments above)
WebKit Review Bot
Comment 9 2011-07-25 18:23:49 PDT
Attachment 101957 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCor..." exit_code: 1 Source/WebCore/xml/parser/XMLTokenizer.cpp:227: Non-label code inside switch statements should be indented. [whitespace/indent] [4] Total errors found: 1 in 11 files If any of these errors are false positives, please file a bug against check-webkit-style.
Vicki Pfau
Comment 10 2011-07-25 18:37:55 PDT
(In reply to comment #7) > (From update of attachment 101943 [details]) > View in context: https://bugs.webkit.org/attachment.cgi?id=101943&action=review > > This looks like a great start. Obviously, it would be nice for it to build before landing. :) That's what I get for only checking to make sure it builds in Debug! > > Source/WebCore/xml/parser/XMLTokenizer.cpp:55 > > +inline bool isLetter(UChar cc) > > +{ > > + return (cc >= 'A' && cc <= 'Z') || (cc >= 'a' && cc <= 'z'); > > +} > > I think these functions already exist in ASCIITypes.h I see an ASCIICTypes.h, which uses unsigned short and not UChar. Close enough? > > Source/WebCore/xml/parser/XMLTokenizer.cpp:472 > > + DEFINE_STATIC_LOCAL(String, xmlString, ("xml ")); > > Is the space necessary after the xml? There can't be other sorts of whitespace? Good catch. This can be any whitespace accepted by the tokenizer as whitespace. I'll fix that. > > Source/WebCore/xml/parser/XMLTokenizer.h:112 > > + virtual bool shouldSkipNullCharacters() const { return false; } > > I thought this wasn't virtual anymore? Whoops, this snuck in under my radar. I implemented the tokenizer before I re-optimized the merged token class, and forgot to remove this.
Collabora GTK+ EWS bot
Comment 11 2011-07-25 19:21:25 PDT
Vicki Pfau
Comment 12 2011-07-26 10:59:14 PDT
WebKit Review Bot
Comment 13 2011-07-26 11:02:19 PDT
Attachment 102024 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCor..." exit_code: 1 Source/WebCore/xml/parser/XMLTokenizer.cpp:212: Non-label code inside switch statements should be indented. [whitespace/indent] [4] Total errors found: 1 in 11 files If any of these errors are false positives, please file a bug against check-webkit-style.
Vicki Pfau
Comment 14 2011-07-26 15:09:19 PDT
Comment on attachment 102024 [details] Patch I realize now I forgot to fix one of the problems mentioned by abarth's previous comments
Vicki Pfau
Comment 15 2011-07-26 16:54:54 PDT
Created attachment 102074 [details] Patch for landing
WebKit Review Bot
Comment 16 2011-07-26 16:57:43 PDT
Attachment 102074 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCor..." exit_code: 1 Source/WebCore/xml/parser/XMLTokenizer.cpp:212: Non-label code inside switch statements should be indented. [whitespace/indent] [4] Source/WebCore/xml/parser/XMLTokenizer.cpp:490: An else should appear on the same line as the preceding } [whitespace/newline] [4] Total errors found: 2 in 11 files If any of these errors are false positives, please file a bug against check-webkit-style.
WebKit Review Bot
Comment 17 2011-07-26 19:25:58 PDT
Comment on attachment 102074 [details] Patch for landing Clearing flags on attachment: 102074 Committed r91811: <http://trac.webkit.org/changeset/91811>
WebKit Review Bot
Comment 18 2011-07-26 19:26:04 PDT
All reviewed patches have been landed. Closing bug.
Note You need to log in before you can comment on or make changes to this bug.