Bug 40199 - HTML5 parser should normalize line endings
Summary: HTML5 parser should normalize line endings
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: 528+ (Nightly build)
Hardware: Other OS X 10.5
: P2 Normal
Assignee: Adam Barth
URL:
Keywords:
Depends on:
Blocks: 39259
  Show dependency treegraph
 
Reported: 2010-06-04 19:31 PDT by Adam Barth
Modified: 2010-06-07 11:34 PDT (History)
1 user (show)

See Also:


Attachments
Patch (12.59 KB, patch)
2010-06-04 19:48 PDT, Adam Barth
no flags Details | Formatted Diff | Diff
Patch (14.36 KB, patch)
2010-06-07 11:19 PDT, Adam Barth
eric: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Adam Barth 2010-06-04 19:31:28 PDT
HTML5 parser should normalize line endings
Comment 1 Adam Barth 2010-06-04 19:48:14 PDT
Created attachment 57947 [details]
Patch
Comment 2 Eric Seidel (no email) 2010-06-04 22:50:08 PDT
Comment on attachment 57947 [details]
Patch

WebCore/html/HTML5Lexer.cpp:335
 +  #define PEEK_AND_RECONSUME_IN(StateName)                                   \
Please add comments to these macros about when to use which.  It's not clear when to use PEEK_AND_ vs. normal RECONSUME

WebCore/html/HTML5Lexer.cpp:338
 +          if (!m_inputStreamPreprocessor.peek(source, m_lineNumber))         \
This might want to be a macro itself, sicne it's used other places.  EXIT_IF_PEEK_FAIL or something?

WebCore/html/HTML5Lexer.cpp:413
 +      if (m_skipLeadingNewLineForListing) {
We might want to document in the header why both the input stream preprocessor "skip next newline" and this m_skipLeadingNewLineForListing are needed.  They're both needed.  Mostly to cover the <pre>\r\n case cleanly.

WebCore/html/HTML5Lexer.cpp:1160
 +              // We ignore the return value because it's checked by the loop.
This comment is not clear.  What's the return value?  How is it checked by which loop?

WebCore/html/HTML5Lexer.h:138
 +              UChar nextInputCharacter() { return m_nextInputCharacter; }
const?

WebCore/html/HTML5Lexer.h:136
 +              InputStreamPreprocessor() : m_nextInputCharacter('\0'), m_skipNextNewLine(false) { }
separate lines?

WebCore/html/HTML5Lexer.h:147
 +                          return false;
So I'm not sure what a false vs. true return here mean.  False means that there is not enough data in the stream to be able to do the \r\n handling?

This just isn't clear.  You talked about adding more ASSERTs in InputStreamPreprocessor to prevent double-peeking too.
Comment 3 Adam Barth 2010-06-07 11:19:52 PDT
Created attachment 58051 [details]
Patch
Comment 4 Eric Seidel (no email) 2010-06-07 11:30:53 PDT
Comment on attachment 58051 [details]
Patch

Looks OK.  Thank you for all the extra commenting.
Comment 5 Adam Barth 2010-06-07 11:34:18 PDT
Committed r60790: <http://trac.webkit.org/changeset/60790>