WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
Bug 40048
Make HTML5Lexer go fast
https://bugs.webkit.org/show_bug.cgi?id=40048
Summary
Make HTML5Lexer go fast
Adam Barth
Reported
2010-06-02 02:33:36 PDT
Make HTML5Lexer go fast
Attachments
work in progress
(93.99 KB, patch)
2010-06-02 02:35 PDT
,
Adam Barth
no flags
Details
Formatted Diff
Diff
more work in progress
(96.32 KB, patch)
2010-06-02 21:43 PDT
,
Adam Barth
no flags
Details
Formatted Diff
Diff
Patch
(94.54 KB, patch)
2010-06-03 00:50 PDT
,
Adam Barth
no flags
Details
Formatted Diff
Diff
Diff w/o changing indent
(77.03 KB, patch)
2010-06-04 00:52 PDT
,
Eric Seidel (no email)
no flags
Details
Formatted Diff
Diff
Patch
(10.45 KB, patch)
2010-06-04 15:24 PDT
,
Adam Barth
eric
: review+
Details
Formatted Diff
Diff
Show Obsolete
(4)
View All
Add attachment
proposed patch, testcase, etc.
Adam Barth
Comment 1
2010-06-02 02:35:16 PDT
Created
attachment 57638
[details]
work in progress
Adam Barth
Comment 2
2010-06-02 21:43:03 PDT
Created
attachment 57733
[details]
more work in progress
Adam Barth
Comment 3
2010-06-03 00:08:01 PDT
Looks like ~1% speedup. I was hoping for more. I still think it's worth doing.
Adam Barth
Comment 4
2010-06-03 00:50:31 PDT
Created
attachment 57742
[details]
Patch
WebKit Review Bot
Comment 5
2010-06-03 00:52:07 PDT
Attachment 57742
[details]
did not pass style-queue: Failed to run "['WebKitTools/Scripts/check-webkit-style', '--no-squash']" exit_code: 1 WebCore/html/HTML5Lexer.cpp:357: Non-label code inside switch statements should be indented. [whitespace/indent] [4] Total errors found: 1 in 3 files If any of these errors are false positives, please file a bug against check-webkit-style.
WebKit Review Bot
Comment 6
2010-06-03 01:35:58 PDT
Attachment 57742
[details]
did not build on win: Build output:
http://webkit-commit-queue.appspot.com/results/3045014
Eric Seidel (no email)
Comment 7
2010-06-04 00:33:34 PDT
Comment on
attachment 57742
[details]
Patch I can't really review a change this large in depth. But I'll give some comments and we should talk more in person. WebCore/html/HTML5Lexer.cpp:346 + #define BEGIN_STATE_MAP() \ Why does this need to be a macro? WebCore/html/HTML5Lexer.cpp:353 + #define END_STATE() ASSERT_NOT_REACHED(); break; I think instead of using END_STATE we should just use blocks { } WebCore/html/HTML5Lexer.cpp:420 + if (cc == '&') If we didn't change the indent in this change it would be easier to read. WebCore/html/HTML5Lexer.cpp:428 + ADVANCE_TO(TagOpenState); Seems we could do all the ADVANCE_TO search/replace in a separate change to minimize this diff. I think this is great. It just needs to be done in smaller pieces if you want more substantial comments. I like the idea of using macros. I like the idea of using gotos.
Eric Seidel (no email)
Comment 8
2010-06-04 00:52:26 PDT
Created
attachment 57850
[details]
Diff w/o changing indent
Adam Barth
Comment 9
2010-06-04 09:32:47 PDT
> WebCore/html/HTML5Lexer.cpp:346 > + #define BEGIN_STATE_MAP() \ > Why does this need to be a macro?
I just thought it was more balanced, but we don't need to use a macro here.
> WebCore/html/HTML5Lexer.cpp:353 > + #define END_STATE() ASSERT_NOT_REACHED(); break; > I think instead of using END_STATE we should just use blocks { }
Using blocks seems independent of whether we use END_STATE. We still want to assert that we don't ever fall through.
> WebCore/html/HTML5Lexer.cpp:420 > + if (cc == '&') > If we didn't change the indent in this change it would be easier to read.
Thanks for uploading a diff with a different indent.
> WebCore/html/HTML5Lexer.cpp:428 > + ADVANCE_TO(TagOpenState); > Seems we could do all the ADVANCE_TO search/replace in a separate change to minimize this diff.
ok.
Adam Barth
Comment 10
2010-06-04 12:06:29 PDT
Committed
r60694
: <
http://trac.webkit.org/changeset/60694
>
Adam Barth
Comment 11
2010-06-04 12:19:53 PDT
Committed
r60696
: <
http://trac.webkit.org/changeset/60696
>
Adam Barth
Comment 12
2010-06-04 12:30:22 PDT
Committed
r60697
: <
http://trac.webkit.org/changeset/60697
>
Adam Barth
Comment 13
2010-06-04 12:38:45 PDT
Committed
r60699
: <
http://trac.webkit.org/changeset/60699
>
Adam Barth
Comment 14
2010-06-04 12:44:38 PDT
Committed
r60701
: <
http://trac.webkit.org/changeset/60701
>
Adam Barth
Comment 15
2010-06-04 12:54:45 PDT
Committed
r60702
: <
http://trac.webkit.org/changeset/60702
>
Adam Barth
Comment 16
2010-06-04 13:00:54 PDT
Committed
r60703
: <
http://trac.webkit.org/changeset/60703
>
Adam Barth
Comment 17
2010-06-04 13:06:33 PDT
Committed
r60704
: <
http://trac.webkit.org/changeset/60704
>
Adam Barth
Comment 18
2010-06-04 14:56:54 PDT
Committed
r60711
: <
http://trac.webkit.org/changeset/60711
>
Adam Barth
Comment 19
2010-06-04 15:12:51 PDT
Committed
r60712
: <
http://trac.webkit.org/changeset/60712
>
Adam Barth
Comment 20
2010-06-04 15:24:54 PDT
Created
attachment 57921
[details]
Patch
Adam Barth
Comment 21
2010-06-04 15:25:45 PDT
Ok, after landing like a billion wind-up patches, this patch should hopefully be reviewable now.
WebKit Review Bot
Comment 22
2010-06-04 15:58:17 PDT
http://trac.webkit.org/changeset/60712
might have broken GTK Linux 32-bit Release The following changes are on the blame list:
http://trac.webkit.org/changeset/60712
http://trac.webkit.org/changeset/60713
http://trac.webkit.org/changeset/60709
http://trac.webkit.org/changeset/60710
http://trac.webkit.org/changeset/60711
WebKit Review Bot
Comment 23
2010-06-04 15:58:51 PDT
http://trac.webkit.org/changeset/60711
might have broken GTK Linux 32-bit Release The following changes are on the blame list:
http://trac.webkit.org/changeset/60712
http://trac.webkit.org/changeset/60713
http://trac.webkit.org/changeset/60709
http://trac.webkit.org/changeset/60710
http://trac.webkit.org/changeset/60711
Eric Seidel (no email)
Comment 24
2010-06-04 22:24:50 PDT
Comment on
attachment 57921
[details]
Patch WebCore/html/HTML5Lexer.cpp:323 + #define RECONSUME_IN(StateName) \ Intentionally changing which column the \ is in? WebCore/html/HTML5Lexer.cpp:327 + goto StateName; \ And the case of StateName? WebCore/html/HTML5Lexer.cpp:334 + if (source.isEmpty()) \ Why does this check go after the advance and not before? (aka, after processing)? WebCore/html/HTML5Lexer.cpp:399 + if (source.isEmpty()) You write this check in two places. Should it be a macro?
Eric Seidel (no email)
Comment 25
2010-06-04 22:26:28 PDT
Comment on
attachment 57921
[details]
Patch Would like an answer to the "why does the isEmpty check go there" question before landing, but in general looks good!.
Adam Barth
Comment 26
2010-06-04 23:29:40 PDT
> WebCore/html/HTML5Lexer.cpp:323 > + #define RECONSUME_IN(StateName) \ > Intentionally changing which column the \ is in?
Now, that's an artifact of merging back in code from the ASAD patch.
> WebCore/html/HTML5Lexer.cpp:327 > + goto StateName; \ > And the case of StateName?
Same here. I'll change this one back.
> WebCore/html/HTML5Lexer.cpp:334 > + if (source.isEmpty()) \ > Why does this check go after the advance and not before? (aka, after processing)?
We process the lookahead character and then advance when we're done "consuming" it. If we've just advanced over the last character of the string, there's no lookahead character and nothing left to do, we so we return. The spec has a mix of when it decides to consume the lookahead character. I went with this design because it's what the PreloadScanner does. Now that we're using gotos instead of a jump table, we can do this verbatim the way the spec works, if you like, which would mean having a CONSUME macro at the beginning of most (but not all) states. I don't think there's a big difference one way or the other. We might go with whatever is faster.
> WebCore/html/HTML5Lexer.cpp:399 > + if (source.isEmpty()) > You write this check in two places. Should it be a macro?
Maybe. It tried to keep the macros under control, but we can proliferate them more if you like.
Adam Barth
Comment 27
2010-06-04 23:58:08 PDT
Committed
r60739
: <
http://trac.webkit.org/changeset/60739
>
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug