Make HTML5Lexer go fast
Created attachment 57638 [details] work in progress
Created attachment 57733 [details] more work in progress
Looks like ~1% speedup. I was hoping for more. I still think it's worth doing.
Created attachment 57742 [details] Patch
Attachment 57742 [details] did not pass style-queue: Failed to run "['WebKitTools/Scripts/check-webkit-style', '--no-squash']" exit_code: 1 WebCore/html/HTML5Lexer.cpp:357: Non-label code inside switch statements should be indented. [whitespace/indent] [4] Total errors found: 1 in 3 files If any of these errors are false positives, please file a bug against check-webkit-style.
Attachment 57742 [details] did not build on win: Build output: http://webkit-commit-queue.appspot.com/results/3045014
Comment on attachment 57742 [details] Patch I can't really review a change this large in depth. But I'll give some comments and we should talk more in person. WebCore/html/HTML5Lexer.cpp:346 + #define BEGIN_STATE_MAP() \ Why does this need to be a macro? WebCore/html/HTML5Lexer.cpp:353 + #define END_STATE() ASSERT_NOT_REACHED(); break; I think instead of using END_STATE we should just use blocks { } WebCore/html/HTML5Lexer.cpp:420 + if (cc == '&') If we didn't change the indent in this change it would be easier to read. WebCore/html/HTML5Lexer.cpp:428 + ADVANCE_TO(TagOpenState); Seems we could do all the ADVANCE_TO search/replace in a separate change to minimize this diff. I think this is great. It just needs to be done in smaller pieces if you want more substantial comments. I like the idea of using macros. I like the idea of using gotos.
Created attachment 57850 [details] Diff w/o changing indent
> WebCore/html/HTML5Lexer.cpp:346 > + #define BEGIN_STATE_MAP() \ > Why does this need to be a macro? I just thought it was more balanced, but we don't need to use a macro here. > WebCore/html/HTML5Lexer.cpp:353 > + #define END_STATE() ASSERT_NOT_REACHED(); break; > I think instead of using END_STATE we should just use blocks { } Using blocks seems independent of whether we use END_STATE. We still want to assert that we don't ever fall through. > WebCore/html/HTML5Lexer.cpp:420 > + if (cc == '&') > If we didn't change the indent in this change it would be easier to read. Thanks for uploading a diff with a different indent. > WebCore/html/HTML5Lexer.cpp:428 > + ADVANCE_TO(TagOpenState); > Seems we could do all the ADVANCE_TO search/replace in a separate change to minimize this diff. ok.
Committed r60694: <http://trac.webkit.org/changeset/60694>
Committed r60696: <http://trac.webkit.org/changeset/60696>
Committed r60697: <http://trac.webkit.org/changeset/60697>
Committed r60699: <http://trac.webkit.org/changeset/60699>
Committed r60701: <http://trac.webkit.org/changeset/60701>
Committed r60702: <http://trac.webkit.org/changeset/60702>
Committed r60703: <http://trac.webkit.org/changeset/60703>
Committed r60704: <http://trac.webkit.org/changeset/60704>
Committed r60711: <http://trac.webkit.org/changeset/60711>
Committed r60712: <http://trac.webkit.org/changeset/60712>
Created attachment 57921 [details] Patch
Ok, after landing like a billion wind-up patches, this patch should hopefully be reviewable now.
http://trac.webkit.org/changeset/60712 might have broken GTK Linux 32-bit Release The following changes are on the blame list: http://trac.webkit.org/changeset/60712 http://trac.webkit.org/changeset/60713 http://trac.webkit.org/changeset/60709 http://trac.webkit.org/changeset/60710 http://trac.webkit.org/changeset/60711
http://trac.webkit.org/changeset/60711 might have broken GTK Linux 32-bit Release The following changes are on the blame list: http://trac.webkit.org/changeset/60712 http://trac.webkit.org/changeset/60713 http://trac.webkit.org/changeset/60709 http://trac.webkit.org/changeset/60710 http://trac.webkit.org/changeset/60711
Comment on attachment 57921 [details] Patch WebCore/html/HTML5Lexer.cpp:323 + #define RECONSUME_IN(StateName) \ Intentionally changing which column the \ is in? WebCore/html/HTML5Lexer.cpp:327 + goto StateName; \ And the case of StateName? WebCore/html/HTML5Lexer.cpp:334 + if (source.isEmpty()) \ Why does this check go after the advance and not before? (aka, after processing)? WebCore/html/HTML5Lexer.cpp:399 + if (source.isEmpty()) You write this check in two places. Should it be a macro?
Comment on attachment 57921 [details] Patch Would like an answer to the "why does the isEmpty check go there" question before landing, but in general looks good!.
> WebCore/html/HTML5Lexer.cpp:323 > + #define RECONSUME_IN(StateName) \ > Intentionally changing which column the \ is in? Now, that's an artifact of merging back in code from the ASAD patch. > WebCore/html/HTML5Lexer.cpp:327 > + goto StateName; \ > And the case of StateName? Same here. I'll change this one back. > WebCore/html/HTML5Lexer.cpp:334 > + if (source.isEmpty()) \ > Why does this check go after the advance and not before? (aka, after processing)? We process the lookahead character and then advance when we're done "consuming" it. If we've just advanced over the last character of the string, there's no lookahead character and nothing left to do, we so we return. The spec has a mix of when it decides to consume the lookahead character. I went with this design because it's what the PreloadScanner does. Now that we're using gotos instead of a jump table, we can do this verbatim the way the spec works, if you like, which would mean having a CONSUME macro at the beginning of most (but not all) states. I don't think there's a big difference one way or the other. We might go with whatever is faster. > WebCore/html/HTML5Lexer.cpp:399 > + if (source.isEmpty()) > You write this check in two places. Should it be a macro? Maybe. It tried to keep the macros under control, but we can proliferate them more if you like.
Committed r60739: <http://trac.webkit.org/changeset/60739>