WebKit Bugzilla
New
Browse
Search+
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
61913
Improve keyword lookup
https://bugs.webkit.org/show_bug.cgi?id=61913
Summary
Improve keyword lookup
Oliver Hunt
Reported
2011-06-02 01:17:43 PDT
Created
attachment 95737
[details]
decision tree generator Currently the logic for identifying keywords in the lexer is somewhat awful -- essentially it does 1. Identify range of characters valid identifier characters 2. Convert range to an Identifier (hash lookup + appending to the identifier arena's identifier vector) 3. hash lookup into keyword table + yet another branch Switching to a direct coded decision tree allows us to drop steps 2 and 3 for keywords, and drop step 3 for all identifiers. I've attached a patch that adds a decision tree generator script (unfortunately i haven't added the machinery to automate the build, so you need to "python KeywordLookupGenerator.py Keywords.table > KeywordLookup.h" in Source/JavaScriptCore/parser It's still not perfect as it's using switches for the single character decisions, when we probably want to have carefully ordered if/else branches so common keywords are tested first. That said on my ad-hoc parser perf test it's around a 4% win on my macbook air, although i think it's closer to 1% on my mac pro when i last measured there. I've also spent some time fiddling with improving character match performance so it's not currently endian safe (lalalalala). Anyhoo, it's a work in progress, should be done tomorrow.
Attachments
decision tree generator
(12.29 KB, text/plain)
2011-06-02 01:17 PDT
,
Oliver Hunt
no flags
Details
Patch
(22.38 KB, patch)
2011-06-03 15:13 PDT
,
Oliver Hunt
no flags
Details
Formatted Diff
Diff
Patch
(22.37 KB, patch)
2011-06-03 15:57 PDT
,
Oliver Hunt
ggaren
: review+
Details
Formatted Diff
Diff
Example output
(8.32 KB, text/plain)
2011-06-03 15:57 PDT
,
Oliver Hunt
no flags
Details
Show Obsolete
(1)
View All
Add attachment
proposed patch, testcase, etc.
Zoltan Herczeg
Comment 1
2011-06-02 06:21:44 PDT
I have some ideas to speeding up the reserved identifier searching: Since we know the length of the token, we could select the length class of the token: Tokens with length of 2 if do in Tokens with length of 3 for new var try Tokens with length of 4 this else with enum null true case void Tokens with length of 5 while throw class super false break catch const Tokens with length of 6 export import return delete switch typeof Tokens with length of 7 default finally extends Tokens with length of 8 continue function debugger Tokens with length of 10 instanceof Decision trees usually start with the examination of the first character. Perhaps it could be faster to check other characters first. I.e: lets choose the tokens which length is 4: There are two of them, which starts with 'e', but the second character is always unique. Tokens with length of 2 : 2nd character is unique Tokens with length of 3 : 1st character is unique Tokens with length of 4 : 2nd character is unique Tokens with length of 5 : 3rd character is unique Tokens with length of 6 - 10 : 1st character is unique
Geoffrey Garen
Comment 2
2011-06-02 12:08:58 PDT
Removing switch should be a big improvement, since a switch is typically a pair of bounds checks followed by a table lookup followed by an indirect branch. Also, since you're testing for exact characters, you can drop the isIdentPart tests. To handle the last few characters in the script without special cases, perhaps you can either: (a) Coax the SourceCode mechanism to pad the end of every script with N zeroes OR (b) Change the lexer to, upon encountering a failed "N characters left" test, copy the remaining characters into a fixed buffer and pad the end of the buffer with N zeroes.
Geoffrey Garen
Comment 3
2011-06-02 12:09:29 PDT
> Since we know the length of the token, we could select the length class of the token:
Actually, one part of Oliver's optimization is to stop pre-computing the length of the token.
Oliver Hunt
Comment 4
2011-06-02 14:08:19 PDT
(In reply to
comment #2
)
> Removing switch should be a big improvement, since a switch is typically a pair of bounds checks followed by a table lookup followed by an indirect branch.
Yeah, however need to work on ordering to get the right choices made.
> > Also, since you're testing for exact characters, you can drop the isIdentPart tests.
The isIdentPart check is to check that we've reached the end of the token -- it can't be dropped
> > To handle the last few characters in the script without special cases, perhaps you can either: > (a) Coax the SourceCode mechanism to pad the end of every script with N zeroes > OR > (b) Change the lexer to, upon encountering a failed "N characters left" test, copy the remaining characters into a fixed buffer and pad the end of the buffer with N zeroes.
Zoltan Herczeg
Comment 5
2011-06-02 23:14:24 PDT
Hm I was thinking more about this. If we prefer switches over switches, we could use some divide and conquer strategy. A special binary search in other words. I.e: divide the tokens to two (or three?) groups based on the first character is lower or bigger than, let's say 'm'. If the groups have roughly equal size, that could help to know which tokens are not possible. And only machine predictable switches are used.
Geoffrey Garen
Comment 6
2011-06-03 13:21:30 PDT
(In reply to
comment #5
)
> Hm I was thinking more about this. If we prefer switches over switches, we could use some divide and conquer strategy. A special binary search in other words. I.e: divide the tokens to two (or three?) groups based on the first character is lower or bigger than, let's say 'm'. If the groups have roughly equal size, that could help to know which tokens are not possible. And only machine predictable switches are used.
Binary search is a good strategy when the likelihood of all inputs is evenly distributed; however, Oliver has found that not to be the case with keywords. Some keywords are much more popular than others.
Oliver Hunt
Comment 7
2011-06-03 15:13:00 PDT
Created
attachment 95973
[details]
Patch
Oliver Hunt
Comment 8
2011-06-03 15:57:07 PDT
Created
attachment 95977
[details]
Patch
Oliver Hunt
Comment 9
2011-06-03 15:57:54 PDT
Created
attachment 95978
[details]
Example output Example output
Geoffrey Garen
Comment 10
2011-06-03 16:05:59 PDT
Comment on
attachment 95977
[details]
Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=95973&action=review
r=me with these changes
> if ((remaining >= maxTokenLength) && !(lexType & IgnoreReservedWords)) {
WebKit style says this should be an early return (assuming an inline function).
> Source/JavaScriptCore/parser/Lexer.cpp:229 > +#if CPU(NEEDS_ALIGNED_ACCESS) > + > +#define COMPARE_CHARACTERS2(address, char1, char2) \ > + (((address)[0] == char1) && ((address)[1] == char2)) > +#define COMPARE_CHARACTERS4(address, char1, char2, char3, char4) \ > + (COMPARE_CHARACTERS2(address, char1, char2) && COMPARE_CHARACTERS2((address) + 2, char3, char4))
Seems like a lot of these macros could be inline functions instead.
> Source/JavaScriptCore/parser/Lexer.cpp:455 > +#include "KeywordLookup.h"
I think it would be better if KeywordLookup.h defined an inline function instead. Easier for debugging.
> Source/JavaScriptCore/parser/Lexer.h:116 > + template <int shiftAmount, bool shouldBoundsCheck> void internalShift();
An enum { DoBoundsCheck, DoNotBoundsCheck } would help make call sites more obvious. A stand-alone "true" or "false" is non-obvious.
Oliver Hunt
Comment 11
2011-06-03 16:30:38 PDT
Committed
r88076
: <
http://trac.webkit.org/changeset/88076
>
James Robinson
Comment 12
2011-06-03 17:18:27 PDT
This is producing a lot of stdout spew when compiling chromium (at least on linux):
http://build.webkit.org/builders/Chromium%20Linux%20Release/builds/31361/steps/compile-webkit/logs/stdio
I'm guessing this is not intentional?
Oliver Hunt
Comment 13
2011-06-03 17:42:41 PDT
(In reply to
comment #12
)
> This is producing a lot of stdout spew when compiling chromium (at least on linux): > >
http://build.webkit.org/builders/Chromium%20Linux%20Release/builds/31361/steps/compile-webkit/logs/stdio
> > I'm guessing this is not intentional?
Nope, i was trying to make the gyp build work -- it should be piping to the output file (i tried to copy what's done for the regexp table gen)
Joseph Pecoraro
Comment 14
2011-06-03 21:26:01 PDT
Cool patch! I can't imagine "debugger" ever being more common than "delete" in code we want to be fast. Probably just a nit on my part, but since this did give weights to keywords for popularity, maybe we could negatively weight debugging or reserved but currently unused keywords.
Oliver Hunt
Comment 15
2011-06-03 21:54:08 PDT
(In reply to
comment #14
)
> Cool patch! > > I can't imagine "debugger" ever being more common than "delete" in code we > want to be fast. Probably just a nit on my part, but since this did give weights > to keywords for popularity, maybe we could negatively weight debugging or > reserved but currently unused keywords.
The falloff is dramatic, delete may be more common than debugger (and other reserved words) but they're all so uncommon that the branch ordering is irrelevant by then -- it's probably not overly relevant for the less common of those that do have weights. My actual data shows delete to make up around 0.3% of keywords that we see. The reality is that if a token begins with 'de' it is more likely to not be a keyword than it is to be debugger, default or delete.
Ryosuke Niwa
Comment 16
2011-06-03 22:00:18 PDT
Mn... it seems like 5 tests are crashing on waterfall after this fix and
http://trac.webkit.org/changeset/88082
:
http://build.webkit.org/builders/SnowLeopard%20Intel%20Release%20%28Tests%29/builds/30035
Should we revert the changesets for now?
Oliver Hunt
Comment 17
2011-06-03 22:01:39 PDT
(In reply to
comment #16
)
> Mn... it seems like 5 tests are crashing on waterfall after this fix and
http://trac.webkit.org/changeset/88082
: >
http://build.webkit.org/builders/SnowLeopard%20Intel%20Release%20%28Tests%29/builds/30035
> > Should we revert the changesets for now?
nope, i'll look into it.
Ryosuke Niwa
Comment 18
2011-06-03 22:02:27 PDT
Also, lots of tests are crashing on Windows:
http://build.webkit.org/builders/Windows%20XP%20Debug%20%28Tests%29/builds/29299
http://build.webkit.org/results/Windows%20XP%20Debug%20(Tests)/r88089%20(29307)/results.html
Oliver Hunt
Comment 19
2011-06-03 22:06:32 PDT
(In reply to
comment #18
)
> Also, lots of tests are crashing on Windows: >
http://build.webkit.org/builders/Windows%20XP%20Debug%20%28Tests%29/builds/29299
>
http://build.webkit.org/results/Windows%20XP%20Debug%20(Tests)/r88089%20(29307)/results.html
Yeah, i see the problem. sadly adds an additional write I though i could avoid :-( Wonder why it didn't fail for me earlier :-/
Ryosuke Niwa
Comment 20
2011-06-03 22:15:23 PDT
(In reply to
comment #19
)
> Yeah, i see the problem. sadly adds an additional write I though i could avoid :-( > > Wonder why it didn't fail for me earlier :-/
It seems like the crashes are somewhat flaky :(
Oliver Hunt
Comment 21
2011-06-03 23:14:35 PDT
(In reply to
comment #20
)
> (In reply to
comment #19
) > > Yeah, i see the problem. sadly adds an additional write I though i could avoid :-( > > > > Wonder why it didn't fail for me earlier :-/ > > It seems like the crashes are somewhat flaky :(
Fix up for review in
https://bugs.webkit.org/show_bug.cgi?id=62086
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug