RESOLVED FIXED Bug 40048
Make HTML5Lexer go fast
https://bugs.webkit.org/show_bug.cgi?id=40048
Summary Make HTML5Lexer go fast
Adam Barth
Reported 2010-06-02 02:33:36 PDT
Make HTML5Lexer go fast
Attachments
work in progress (93.99 KB, patch)
2010-06-02 02:35 PDT, Adam Barth
no flags
more work in progress (96.32 KB, patch)
2010-06-02 21:43 PDT, Adam Barth
no flags
Patch (94.54 KB, patch)
2010-06-03 00:50 PDT, Adam Barth
no flags
Diff w/o changing indent (77.03 KB, patch)
2010-06-04 00:52 PDT, Eric Seidel (no email)
no flags
Patch (10.45 KB, patch)
2010-06-04 15:24 PDT, Adam Barth
eric: review+
Adam Barth
Comment 1 2010-06-02 02:35:16 PDT
Created attachment 57638 [details] work in progress
Adam Barth
Comment 2 2010-06-02 21:43:03 PDT
Created attachment 57733 [details] more work in progress
Adam Barth
Comment 3 2010-06-03 00:08:01 PDT
Looks like ~1% speedup. I was hoping for more. I still think it's worth doing.
Adam Barth
Comment 4 2010-06-03 00:50:31 PDT
WebKit Review Bot
Comment 5 2010-06-03 00:52:07 PDT
Attachment 57742 [details] did not pass style-queue: Failed to run "['WebKitTools/Scripts/check-webkit-style', '--no-squash']" exit_code: 1 WebCore/html/HTML5Lexer.cpp:357: Non-label code inside switch statements should be indented. [whitespace/indent] [4] Total errors found: 1 in 3 files If any of these errors are false positives, please file a bug against check-webkit-style.
WebKit Review Bot
Comment 6 2010-06-03 01:35:58 PDT
Eric Seidel (no email)
Comment 7 2010-06-04 00:33:34 PDT
Comment on attachment 57742 [details] Patch I can't really review a change this large in depth. But I'll give some comments and we should talk more in person. WebCore/html/HTML5Lexer.cpp:346 + #define BEGIN_STATE_MAP() \ Why does this need to be a macro? WebCore/html/HTML5Lexer.cpp:353 + #define END_STATE() ASSERT_NOT_REACHED(); break; I think instead of using END_STATE we should just use blocks { } WebCore/html/HTML5Lexer.cpp:420 + if (cc == '&') If we didn't change the indent in this change it would be easier to read. WebCore/html/HTML5Lexer.cpp:428 + ADVANCE_TO(TagOpenState); Seems we could do all the ADVANCE_TO search/replace in a separate change to minimize this diff. I think this is great. It just needs to be done in smaller pieces if you want more substantial comments. I like the idea of using macros. I like the idea of using gotos.
Eric Seidel (no email)
Comment 8 2010-06-04 00:52:26 PDT
Created attachment 57850 [details] Diff w/o changing indent
Adam Barth
Comment 9 2010-06-04 09:32:47 PDT
> WebCore/html/HTML5Lexer.cpp:346 > + #define BEGIN_STATE_MAP() \ > Why does this need to be a macro? I just thought it was more balanced, but we don't need to use a macro here. > WebCore/html/HTML5Lexer.cpp:353 > + #define END_STATE() ASSERT_NOT_REACHED(); break; > I think instead of using END_STATE we should just use blocks { } Using blocks seems independent of whether we use END_STATE. We still want to assert that we don't ever fall through. > WebCore/html/HTML5Lexer.cpp:420 > + if (cc == '&') > If we didn't change the indent in this change it would be easier to read. Thanks for uploading a diff with a different indent. > WebCore/html/HTML5Lexer.cpp:428 > + ADVANCE_TO(TagOpenState); > Seems we could do all the ADVANCE_TO search/replace in a separate change to minimize this diff. ok.
Adam Barth
Comment 10 2010-06-04 12:06:29 PDT
Adam Barth
Comment 11 2010-06-04 12:19:53 PDT
Adam Barth
Comment 12 2010-06-04 12:30:22 PDT
Adam Barth
Comment 13 2010-06-04 12:38:45 PDT
Adam Barth
Comment 14 2010-06-04 12:44:38 PDT
Adam Barth
Comment 15 2010-06-04 12:54:45 PDT
Adam Barth
Comment 16 2010-06-04 13:00:54 PDT
Adam Barth
Comment 17 2010-06-04 13:06:33 PDT
Adam Barth
Comment 18 2010-06-04 14:56:54 PDT
Adam Barth
Comment 19 2010-06-04 15:12:51 PDT
Adam Barth
Comment 20 2010-06-04 15:24:54 PDT
Adam Barth
Comment 21 2010-06-04 15:25:45 PDT
Ok, after landing like a billion wind-up patches, this patch should hopefully be reviewable now.
Eric Seidel (no email)
Comment 24 2010-06-04 22:24:50 PDT
Comment on attachment 57921 [details] Patch WebCore/html/HTML5Lexer.cpp:323 + #define RECONSUME_IN(StateName) \ Intentionally changing which column the \ is in? WebCore/html/HTML5Lexer.cpp:327 + goto StateName; \ And the case of StateName? WebCore/html/HTML5Lexer.cpp:334 + if (source.isEmpty()) \ Why does this check go after the advance and not before? (aka, after processing)? WebCore/html/HTML5Lexer.cpp:399 + if (source.isEmpty()) You write this check in two places. Should it be a macro?
Eric Seidel (no email)
Comment 25 2010-06-04 22:26:28 PDT
Comment on attachment 57921 [details] Patch Would like an answer to the "why does the isEmpty check go there" question before landing, but in general looks good!.
Adam Barth
Comment 26 2010-06-04 23:29:40 PDT
> WebCore/html/HTML5Lexer.cpp:323 > + #define RECONSUME_IN(StateName) \ > Intentionally changing which column the \ is in? Now, that's an artifact of merging back in code from the ASAD patch. > WebCore/html/HTML5Lexer.cpp:327 > + goto StateName; \ > And the case of StateName? Same here. I'll change this one back. > WebCore/html/HTML5Lexer.cpp:334 > + if (source.isEmpty()) \ > Why does this check go after the advance and not before? (aka, after processing)? We process the lookahead character and then advance when we're done "consuming" it. If we've just advanced over the last character of the string, there's no lookahead character and nothing left to do, we so we return. The spec has a mix of when it decides to consume the lookahead character. I went with this design because it's what the PreloadScanner does. Now that we're using gotos instead of a jump table, we can do this verbatim the way the spec works, if you like, which would mean having a CONSUME macro at the beginning of most (but not all) states. I don't think there's a big difference one way or the other. We might go with whatever is faster. > WebCore/html/HTML5Lexer.cpp:399 > + if (source.isEmpty()) > You write this check in two places. Should it be a macro? Maybe. It tried to keep the macros under control, but we can proliferate them more if you like.
Adam Barth
Comment 27 2010-06-04 23:58:08 PDT
Note You need to log in before you can comment on or make changes to this bug.