WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
64566
New XML tokenizer
https://bugs.webkit.org/show_bug.cgi?id=64566
Summary
New XML tokenizer
Vicki Pfau
Reported
2011-07-14 15:31:29 PDT
For the new XML parser, we'll need a tokenizer for the source before we can parse the tokens.
Attachments
Patch
(52.30 KB, patch)
2011-07-25 17:04 PDT
,
Vicki Pfau
no flags
Details
Formatted Diff
Diff
Patch
(52.33 KB, patch)
2011-07-25 18:19 PDT
,
Vicki Pfau
no flags
Details
Formatted Diff
Diff
Patch
(53.52 KB, patch)
2011-07-26 10:59 PDT
,
Vicki Pfau
no flags
Details
Formatted Diff
Diff
Patch for landing
(54.23 KB, patch)
2011-07-26 16:54 PDT
,
Vicki Pfau
no flags
Details
Formatted Diff
Diff
Show Obsolete
(3)
View All
Add attachment
proposed patch, testcase, etc.
Vicki Pfau
Comment 1
2011-07-25 17:04:43 PDT
Created
attachment 101943
[details]
Patch
WebKit Review Bot
Comment 2
2011-07-25 17:07:01 PDT
Attachment 101943
[details]
did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCor..." exit_code: 1 Source/WebCore/xml/parser/XMLTokenizer.cpp:227: Non-label code inside switch statements should be indented. [whitespace/indent] [4] Total errors found: 1 in 11 files If any of these errors are false positives, please file a bug against check-webkit-style.
WebKit Review Bot
Comment 3
2011-07-25 17:55:07 PDT
Comment on
attachment 101943
[details]
Patch
Attachment 101943
[details]
did not pass cr-mac-ews (chromium): Output:
http://queues.webkit.org/results/9253101
WebKit Review Bot
Comment 4
2011-07-25 18:09:23 PDT
Comment on
attachment 101943
[details]
Patch
Attachment 101943
[details]
did not pass mac-ews (mac): Output:
http://queues.webkit.org/results/9253107
WebKit Review Bot
Comment 5
2011-07-25 18:14:03 PDT
Comment on
attachment 101943
[details]
Patch
Attachment 101943
[details]
did not pass chromium-ews (chromium-xvfb): Output:
http://queues.webkit.org/results/9250169
Vicki Pfau
Comment 6
2011-07-25 18:19:39 PDT
Created
attachment 101957
[details]
Patch
Adam Barth
Comment 7
2011-07-25 18:21:01 PDT
Comment on
attachment 101943
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=101943&action=review
This looks like a great start. Obviously, it would be nice for it to build before landing. :)
> Source/WebCore/xml/parser/NewXMLDocumentParser.cpp:62 > + m_token.print();
DEBUG only? This function will obviously evolve. :)
> Source/WebCore/xml/parser/XMLTokenizer.cpp:47 > +inline bool isNumeric(UChar cc)
cc => c (I need to go through the HTML parser and fix this there too.)
> Source/WebCore/xml/parser/XMLTokenizer.cpp:55 > +inline bool isLetter(UChar cc) > +{ > + return (cc >= 'A' && cc <= 'Z') || (cc >= 'a' && cc <= 'z'); > +}
I think these functions already exist in ASCIITypes.h
> Source/WebCore/xml/parser/XMLTokenizer.cpp:62 > +inline bool isValidNameStart(UChar cc)
You might find that this function is hot, but optimizations can come later.
> Source/WebCore/xml/parser/XMLTokenizer.cpp:472 > + DEFINE_STATIC_LOCAL(String, xmlString, ("xml "));
Is the space necessary after the xml? There can't be other sorts of whitespace?
> Source/WebCore/xml/parser/XMLTokenizer.h:110 > + bool gotError() const { return m_parseError; }
I'm not in love with gotError as a name. Maybe errorDuringParsing ? In any case, you should rename the member variable to match the accessor.
> Source/WebCore/xml/parser/XMLTokenizer.h:112 > + virtual bool shouldSkipNullCharacters() const { return false; }
I thought this wasn't virtual anymore?
Adam Barth
Comment 8
2011-07-25 18:21:28 PDT
Comment on
attachment 101957
[details]
Patch (modulo comments above)
WebKit Review Bot
Comment 9
2011-07-25 18:23:49 PDT
Attachment 101957
[details]
did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCor..." exit_code: 1 Source/WebCore/xml/parser/XMLTokenizer.cpp:227: Non-label code inside switch statements should be indented. [whitespace/indent] [4] Total errors found: 1 in 11 files If any of these errors are false positives, please file a bug against check-webkit-style.
Vicki Pfau
Comment 10
2011-07-25 18:37:55 PDT
(In reply to
comment #7
)
> (From update of
attachment 101943
[details]
) > View in context:
https://bugs.webkit.org/attachment.cgi?id=101943&action=review
> > This looks like a great start. Obviously, it would be nice for it to build before landing. :)
That's what I get for only checking to make sure it builds in Debug!
> > Source/WebCore/xml/parser/XMLTokenizer.cpp:55 > > +inline bool isLetter(UChar cc) > > +{ > > + return (cc >= 'A' && cc <= 'Z') || (cc >= 'a' && cc <= 'z'); > > +} > > I think these functions already exist in ASCIITypes.h
I see an ASCIICTypes.h, which uses unsigned short and not UChar. Close enough?
> > Source/WebCore/xml/parser/XMLTokenizer.cpp:472 > > + DEFINE_STATIC_LOCAL(String, xmlString, ("xml ")); > > Is the space necessary after the xml? There can't be other sorts of whitespace?
Good catch. This can be any whitespace accepted by the tokenizer as whitespace. I'll fix that.
> > Source/WebCore/xml/parser/XMLTokenizer.h:112 > > + virtual bool shouldSkipNullCharacters() const { return false; } > > I thought this wasn't virtual anymore?
Whoops, this snuck in under my radar. I implemented the tokenizer before I re-optimized the merged token class, and forgot to remove this.
Collabora GTK+ EWS bot
Comment 11
2011-07-25 19:21:25 PDT
Comment on
attachment 101957
[details]
Patch
Attachment 101957
[details]
did not pass gtk-ews (gtk): Output:
http://queues.webkit.org/results/9249240
Vicki Pfau
Comment 12
2011-07-26 10:59:14 PDT
Created
attachment 102024
[details]
Patch
WebKit Review Bot
Comment 13
2011-07-26 11:02:19 PDT
Attachment 102024
[details]
did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCor..." exit_code: 1 Source/WebCore/xml/parser/XMLTokenizer.cpp:212: Non-label code inside switch statements should be indented. [whitespace/indent] [4] Total errors found: 1 in 11 files If any of these errors are false positives, please file a bug against check-webkit-style.
Vicki Pfau
Comment 14
2011-07-26 15:09:19 PDT
Comment on
attachment 102024
[details]
Patch I realize now I forgot to fix one of the problems mentioned by abarth's previous comments
Vicki Pfau
Comment 15
2011-07-26 16:54:54 PDT
Created
attachment 102074
[details]
Patch for landing
WebKit Review Bot
Comment 16
2011-07-26 16:57:43 PDT
Attachment 102074
[details]
did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/WebCore/ChangeLog', u'Source/WebCor..." exit_code: 1 Source/WebCore/xml/parser/XMLTokenizer.cpp:212: Non-label code inside switch statements should be indented. [whitespace/indent] [4] Source/WebCore/xml/parser/XMLTokenizer.cpp:490: An else should appear on the same line as the preceding } [whitespace/newline] [4] Total errors found: 2 in 11 files If any of these errors are false positives, please file a bug against check-webkit-style.
WebKit Review Bot
Comment 17
2011-07-26 19:25:58 PDT
Comment on
attachment 102074
[details]
Patch for landing Clearing flags on attachment: 102074 Committed
r91811
: <
http://trac.webkit.org/changeset/91811
>
WebKit Review Bot
Comment 18
2011-07-26 19:26:04 PDT
All reviewed patches have been landed. Closing bug.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug