WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
41845
refactor identifier parsing in lexer
https://bugs.webkit.org/show_bug.cgi?id=41845
Summary
refactor identifier parsing in lexer
Zoltan Herczeg
Reported
2010-07-08 03:05:30 PDT
SS --parse-only: no change TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: ?? 31.3ms +/- 1.9% 31.4ms +/- 1.6% not conclusive: might be *1.003x as slow* ============================================================================= jquery: - 5.0ms +/- 0.0% 5.0ms +/- 0.0% 1.3.2: - 5.0ms +/- 0.0% 5.0ms +/- 0.0% mootools: - 5.0ms +/- 0.0% 5.0ms +/- 0.0% 1.2.2-core-nc: - 5.0ms +/- 0.0% 5.0ms +/- 0.0% prototype: ?? 5.8ms +/- 5.2% 5.9ms +/- 3.8% not conclusive: might be *1.017x as slow* 1.6.0.3: ?? 5.8ms +/- 5.2% 5.9ms +/- 3.8% not conclusive: might be *1.017x as slow* concat: - 15.5ms +/- 2.4% 15.5ms +/- 2.4% jquery-mootools-prototype: - 15.5ms +/- 2.4% 15.5ms +/- 2.4% SS: no change as well
Attachments
patch
(22.25 KB, patch)
2010-07-08 03:19 PDT
,
Zoltan Herczeg
no flags
Details
Formatted Diff
Diff
updated patch
(23.58 KB, patch)
2010-07-12 04:58 PDT
,
Zoltan Herczeg
no flags
Details
Formatted Diff
Diff
style patch
(18.46 KB, patch)
2010-07-13 05:55 PDT
,
Zoltan Herczeg
darin
: review-
Details
Formatted Diff
Diff
indentation patch
(15.59 KB, patch)
2010-07-14 00:44 PDT
,
Zoltan Herczeg
no flags
Details
Formatted Diff
Diff
the second part
(12.30 KB, patch)
2010-07-14 02:22 PDT
,
Zoltan Herczeg
no flags
Details
Formatted Diff
Diff
the second part again
(12.24 KB, patch)
2010-07-15 02:17 PDT
,
Zoltan Herczeg
no flags
Details
Formatted Diff
Diff
identifier parsing patch
(7.00 KB, patch)
2010-08-05 06:28 PDT
,
Zoltan Herczeg
no flags
Details
Formatted Diff
Diff
identifier parsing patch (v2)
(7.23 KB, patch)
2010-08-05 06:52 PDT
,
Zoltan Herczeg
no flags
Details
Formatted Diff
Diff
Show Obsolete
(4)
View All
Add attachment
proposed patch, testcase, etc.
Zoltan Herczeg
Comment 1
2010-07-08 03:19:54 PDT
Created
attachment 60862
[details]
patch
Zoltan Herczeg
Comment 2
2010-07-09 23:39:35 PDT
remove the r? from the patch until conflicts are resolved.
Zoltan Herczeg
Comment 3
2010-07-12 04:58:34 PDT
Created
attachment 61209
[details]
updated patch Although there is nothing new in the patch compared to the previous one (except the merge), it causes 8% slowdown on --parse-only now. Not worth to set r?
Zoltan Herczeg
Comment 4
2010-07-13 05:55:55 PDT
Created
attachment 61361
[details]
style patch
Darin Adler
Comment 5
2010-07-13 10:41:31 PDT
Comment on
attachment 61361
[details]
style patch Looks good.
> + // The first free types are fixed, and also used for > + // identifying ascii alpha and alphanumeric characters
I think you mean the first *three* types. This comment should have a period, and ASCII should be in all capitals.
> // 128 ascii codes
ASCII.
> -static unsigned short AsciiCharacters[128] = { > +static unsigned short asciiCharacters[128] = {
Can this be static const? The name of this array is not great. It's called asciiCharacters, but it's not an array of ASCII characters. It's actually an array of the character type of each ASCII character. So its name should be asciiCharacterTypes or typesOfASCIICharacters or something like that. Also, the CharacterTypes enum has a strange name. A value of this enum type is a single character type, so the type name should be CharacterType. Just as "int" is not named "ints".
> static inline bool isIdentStart(int c) > { > - return isASCII(c) ? isASCIIAlpha(c) || c == '$' || c == '_' : isNonASCIIIdentStart(c); > + return isASCII(c) ? asciiCharacters[c] == CharacterAlpha : isNonASCIIIdentStart(c); > }
CharacterAlpha seems like the wrong name for something that includes "$" and "_". Is this a term from the ECMAScript specification? If not, then I suggest that we don't call this CharacterAlpha, and instead call it CharacterIdentifierStart or something along those lines.
> static inline bool isIdentPart(int c) > { > - return isASCII(c) ? isASCIIAlphanumeric(c) || c == '$' || c == '_' : isNonASCIIIdentPart(c); > + return isASCII(c) ? asciiCharacters[c] <= CharacterNumber : isNonASCIIIdentPart(c); > }
Needs a comment to explain why "<=" is correct here.
> + // temporary contains an ascii representation of the code > + // helps to avoid code duplication
It's not clear to me what "temporary" is. Maybe you mean the local variable named "asciiCharacter". Comment should be sentence style. "ascii" should be "ASCII". I don't think this comment says anything more clearly than the code. Saying "avoid code duplication" is too vague to add much clarity. I suggest omitting this comment entirely.
> + if (isNonASCIIIdentStart(m_current)) > + asciiCharacter = 'a';
This code, however, does need a comment. The comment needs to say why it's OK to set the local variable to the letter "a". Comments are most valuable when they explain "why" rather than "what" -- the code itself makes it clear "what". In this case, we set asciiCharacter to any legal identifier starting character so we can get the right result from the code below, but the actual character is not used for anything else. I don't think the code needs to work this confusing way. Instead the local variable could just be a CharacterTypes value. CharacterType type; if (LIKELY(isASCII(m_current)) type = typesOfASCIICharacters[m_current]; else { if (isNonASCIIIdentStart(m_current)) type = CharacterIdentifierStart; else type = CharacterLineTerminator; } switch (type) { That code seems clear enough that you don't even need comments for it, and no less efficient than the code we have in the patch. Can you separate the indentation change of the giant switch statement from the rest of the patch somehow? The diff is hard to read because of that. I'm going to say review- because I think this change is unnecessarily oblique and it's easy to make it better. You could also consider some of my naming suggestions to make the code easier to understand.
Zoltan Herczeg
Comment 6
2010-07-13 10:57:56 PDT
I like your comments. Really makes things better.
> Can you separate the indentation change of the giant switch statement from the rest of the patch somehow? The diff is hard to read because of that.
You mean create a patch, which change only the indentation, and explain in the ChangeLog that the followup patch will use this indentation change?
Zoltan Herczeg
Comment 7
2010-07-14 00:44:56 PDT
Created
attachment 61478
[details]
indentation patch
Zoltan Herczeg
Comment 8
2010-07-14 02:22:01 PDT
Created
attachment 61492
[details]
the second part
Darin Adler
Comment 9
2010-07-14 08:06:16 PDT
Comment on
attachment 61492
[details]
the second part
> + (JSC::):
You should delete tis line in the ChangeLog, created because of a bug in the prepare-ChangeLog script.
> + // Character types are divided into two groups depending they can be part of an identifier or not. > + // Those, whose type value is less or equal than CharacterNumber, can be part of an identifier. > + // (see the CharacterType defintion for more details)
A few small mistakes here. One is "depending they", which should be "depending on whether they". The two commas are not needed or helpful in the second sentence. The sentence in parentheses should have a capital letter and a period. Patch otherwise looks good.
WebKit Commit Bot
Comment 10
2010-07-14 08:21:14 PDT
Comment on
attachment 61478
[details]
indentation patch Clearing flags on attachment: 61478 Committed
r63322
: <
http://trac.webkit.org/changeset/63322
>
WebKit Commit Bot
Comment 11
2010-07-14 08:21:19 PDT
All reviewed patches have been landed. Closing bug.
Zoltan Herczeg
Comment 12
2010-07-15 02:17:18 PDT
Created
attachment 61619
[details]
the second part again
Darin Adler
Comment 13
2010-07-15 06:34:34 PDT
Comment on
attachment 61619
[details]
the second part again JavaScriptCore/parser/Lexer.cpp:353 + // part of an identifier. (See the CharacterType defintion for more details.) Typo here: "defintion". JavaScriptCore/parser/Lexer.cpp:465 + if (UNLIKELY(((static_cast<unsigned>(m_current) - 0xE) & 0x2000))) { The tricky expression here should be inside a named inline function to document what it does. Even if we don't want to do that, at the very least it needs a comment because its non-obvious. That’s not new in this patch, though.
Zoltan Herczeg
Comment 14
2010-07-15 07:24:06 PDT
"the second part again" Landed in
http://trac.webkit.org/changeset/63423
The work is still not finished.
Zoltan Herczeg
Comment 15
2010-08-05 06:28:53 PDT
Created
attachment 63582
[details]
identifier parsing patch This patch reaches the speed of the old code with gotos. SunSpider --parse-only: TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: - 34.0ms +/- 2.2% 33.6ms +/- 1.5% ============================================================================= jquery: - 5.2ms +/- 5.8% 5.1ms +/- 4.4% 1.3.2: - 5.2ms +/- 5.8% 5.1ms +/- 4.4% mootools: - 5.7ms +/- 6.1% 5.4ms +/- 6.8% 1.2.2-core-nc: - 5.7ms +/- 6.1% 5.4ms +/- 6.8% prototype: - 6.5ms +/- 5.8% 6.4ms +/- 5.8% 1.6.0.3: - 6.5ms +/- 5.8% 6.4ms +/- 5.8% concat: ?? 16.6ms +/- 2.2% 16.7ms +/- 2.1% not conclusive: might be *1.006x as slow* jquery-mootools-prototype: ?? 16.6ms +/- 2.2% 16.7ms +/- 2.1% not conclusive: might be *1.006x as slow* Normal SunSpider reports no change.
Eric Seidel (no email)
Comment 16
2010-08-05 06:42:36 PDT
Attachment 63582
[details]
did not build on mac: Build output:
http://queues.webkit.org/results/3559933
Zoltan Herczeg
Comment 17
2010-08-05 06:52:13 PDT
Created
attachment 63583
[details]
identifier parsing patch (v2) removed the unused variable
Darin Adler
Comment 18
2010-08-05 08:21:48 PDT
Comment on
attachment 63583
[details]
identifier parsing patch (v2) Thanks. This looks good.
> +ALWAYS_INLINE JSTokenType Lexer::parseIdent(JSTokenData* lvalp, LexType lexType)
Since this is identifier parsing, how about naming it parseIdentifier? I know the file sometimes does abbreviate to "ident", but I find the full word easier to read as I almost always do in code in general.
> + while (true) { > + if (LIKELY(isIdentPart(m_current))) > + shift(); > + else if (LIKELY(m_current != '\\')) > + break; > + else { > + // \uXXXX unicode characters > + bufferRequired = true; > + if (identifierStart != currentCharacter()) > + m_buffer16.append(identifierStart, currentCharacter() - identifierStart); > + shift(); > + if (UNLIKELY(m_current != 'u')) > + return ERRORTOK; > + shift(); > + int character = getUnicodeCharacter(); > + if (UNLIKELY(character == -1)) > + return ERRORTOK; > + if (UNLIKELY(m_buffer16.size() ? !isIdentPart(character) : !isIdentStart(character))) > + return ERRORTOK; > + record16(character); > + identifierStart = currentCharacter(); > + } > + }
I prefer the "early return" style, so if it gives you the same efficiency I'd like this to be written as follows: if (LIKELY(isIdentPart(m_current))) { shift(); continue; } if (LIKELY(m_current != '\\')) break; ... With the rest of the loop body not indented.
> + // keywords are not checked if there was an \uXXXX in the identifier
This is a confusing comment. I think what we mean to say is that keywords must not be recognized in such cases. Saying keywords "are not checked" leaves it ambiguous whether we are describing a bug or a feature. Also WebKit preferred style is to capitalize comments and end with a period. Especially a comment like this one that is a full sentence.
> + // Continue to the next case
Would be better to have a period and use the terminology "Fall through" as we do in other similar comments.
> + ALWAYS_INLINE JSTokenType parseIdent(JSTokenData* lvalp, LexType lexType);
No need for the argument names lvavlp and lexType here. The types of the arguments speak for themselves without argument names and in such cases we omit the argument names in WebKit code.
Zoltan Herczeg
Comment 19
2010-08-06 04:07:52 PDT
Did the requested changes and landed in
http://trac.webkit.org/changeset/64827
Thank you Darin.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug