Bug 33860
Summary: | potential speed up of parser by recoding expression rules | ||
---|---|---|---|
Product: | WebKit | Reporter: | Patrick Mueller <pmuellr> |
Component: | JavaScriptCore | Assignee: | Nobody <webkit-unassigned> |
Status: | RESOLVED WONTFIX | ||
Severity: | Normal | CC: | ariya.hidayat, zack.carter, zherczeg, zoltan |
Priority: | P2 | ||
Version: | 528+ (Nightly build) | ||
Hardware: | All | ||
OS: | All |
Patrick Mueller
While playing with the bison parser for JavaScriptCore, I did a little googling research on bison optimization. Stumbled upon this:
http://compilers.iecc.com/comparch/article/97-03-086
which claims you can reduce the number of reductions by changing the way the rules are defined; basically flattening the recursive rules that define precedence, and adding the precedence back via %prec directives.
Just to see what was going on, I traced through the following expression (turned on the debug flags)
x = { a: 1 };
The parser walks the following steps when it hits the "1" token, consuming a "}" during the reduction:
NUMBER
Literal
PrimaryExprNoBrace
PrimaryExpr
MemberExpr
NewExpr
LeftHandSideExpr
PostfixExpr
UnaryExpr
MultiplicativeExpr
AdditiveExpr
ShiftExpr
RelationalExpr
EqualityExpr
BitwiseANDExpr
BitwiseXORExpr
BitwiseORExpr
LogicalANDExpr
LogicalORExpr
ConditionalExpr
AssignmentExpr
It seems to me that at least the rules from PostfixExpr down through ConditionalExpr would be good candidates to try this on.
Attachments | ||
---|---|---|
Add attachment proposed patch, testcase, etc. |
Patrick Mueller
In case the link in the summary goes away, here is the relevant text:
One more hint:
If the language has compicated expression syntax (like C)
it is more efficient to express operators precedence explicitly
by %prec directiove, instead of expression syntax tricks like
it is done in the ISO C and C++ Draft Standatd (where operators
precedence is expressed implicitly).
It is significanly reduces number of reduces (by mean of LALR
algorithm) while parsing.
For example:
[bad style]
AdditiveExpression:
AdditiveExpression AddOperator MultiplicativeExpression |
MultiplicativeExpression;
AddOperator:
lxmPlus |
lxmMinus;
MultiplicativeExpression:
MultiplicativeExpression MulOperator HighPrecExpression |
HighPrecExpression;
MulOperator:
lxmMul |
lxmDiv;
[better style]
Expression:
HighPrecExpression |
lxmLeftPar Expression lxmRightPar |
Expression lxmPlus Expression %prec pseudoAdditive |
Expression lxmMinus Expression %prec pseudoAdditive |
Expression lxmMul Expression %prec pseudoMultiplicative |
Expression lxmDiv Expression %prec pseudoMultiplicative;
Regards,
Alexander.
--
Alexander N. Krotoff krotoff@such.srcc.msu.su
Research Computer Center tel: +7(095)939-2638
Moscow State University fax: +7(095)939-4430
Zach Carter
Right, by using ambiguous rules and manually resolving the ambiguities using precedence directives, it should reduce the number of reductions needed. You can also set the precedence of an operator token directly:
%left '+' '-'
%left '*' '/'
etc...
E.g. any rule with a '+' token will have the associated precedence, although using %prec gives you direct control over which rule to apply the precedence to.
Patrick Mueller
Some stats from using YYDEBUG and then running jsc on the file SunSpider/tests/parse-only/jquery-1.3.2.js, looking at output lines of the form:
-> $$ = nterm [whatever}
which appear to be giving interim parse states.
240,083 total states
224,882 states containing "NoNode"
16,074 states containing "NoBF"
174,595 states containing "Expr_NoNode"
10,008 states containing "PostfixExpr_NoNode"
7,821 states containing "ConditionalExpr_NoNode"
so it looks like concentrating on just the "expressions", and the NoNode variety in particular, will hit a large swath of the reductions. The PostFixExpr and ConditionalExpr counts are interesting, because of the number of reductions from the first to the last. There are 13 states between them, so if you ball-park that 7.8K of them went through all 13 states, that's about 100K states. 40% of the states hit in total.
Patrick Mueller
In Bug 32722, comment 28, it's mentioned that Zoltan may be working on a replacement, non-Bison generated parser. In which case this bug can be RESOLVED WONTFIX or something.
Ariya Hidayat
JSC does not use bison-based parser anymore.