34019 – Custom-written JavaScript parser

Zoltan Herczeg

Reported 2010-01-22 13:41:11 PST

A bug entry for a custom written JavaScript parser. FAQ: - Why? I felt I could write a faster, specialized parser than bison - When the work has started? Start of January 2010 (3 weeks ago) - Main components? - grammar : parsing and generating nodes - syntax checker : just parsing (used for checking function bodies to find the terminator '}') - Can grammar and syntax checker be merged? It would help maintaining the code. Possible. I was thinking about that when actually started to separate the grammar from the syntax checker. However, there are some key differences between them (like syntax checker should check nested function bodies). And syntax checker has a much simpler (and faster) code. When you open a simple page (like www.google.com), you also get a huge JS with many functions. Most of them are never executed (browser specific for a different browser), but parsed (syntax checked). I would not like to give up the speed advantage, and the code of the grammar is ugly anyway :( - Advantages compared to bison? Speed. According to my measurement the grammar is 3 and the syntax checker is 4 times faster than bison. - Disadvantages? Readability. Although the code in grammar.y is ugly, it is repetative, and easy to understand. - Focus of the work? Compatibility with the existing bison parser (and the standard as well). Especially to detect parser errors in the same way. - But you skip possible optimizations in this way... I know. But it is only a first step on a long road. When this parser is tested and goes mainline, we could think about improving it. Actually there is a lot of room for such improvements. But not for now. - Testing? See dump-AST.patch . You can dump the AST, and compare the old and new ones. Now 127 fail on javascriptcore tests. Missing features: object literal, switch, arguments feature (no idea) - Supported build systems? Qt now. But easy for others, since you only have to add JavaScriptCore/parser/Grammar.cpp, .h, SyntaxChecker.cpp, .h, GrammarConsts.h - Any help? See the comments in GrammarConsts.h - Anything else? See LEGACY comments in the code. They could be improved soon, but break the AST compatibility. Anyway, we should do something with the divot points, since every node use them in a different way. For now, the compatibility is the first priority, but later we should define general rules for them, and all node should follow these rules.

Zoltan Herczeg

Comment 1 2010-01-22 13:41:45 PST

Created attachment 47222 [details] Dump AST

Zoltan Herczeg

Comment 2 2010-01-22 13:42:40 PST

Created attachment 47223 [details] A new JavaScript Grammar (draft)

Oliver Hunt

Comment 3 2010-01-22 14:35:50 PST

I'm unsure why the syntax checker is separate from the parser. Syntax checking is basically answering the question "can i parse this". There are semantic checks that we will want to do while parsing: the various "features" we track variable and function accumulation partial AST generation New things we _must_ be able to do in the new parser: strict mode identification needs to be done during the parse stage and alters the state of the parser to reject constructs that lax mode would allow, and produces parse time errors instead). For efficiency reasons this basically forces us into a recursive-descent parser :-/

Zoltan Herczeg

Comment 4 2010-01-22 14:57:39 PST

(In reply to comment #3) > I'm unsure why the syntax checker is separate from the parser. Syntax checking > is basically answering the question "can i parse this". The grammar do both syntax checking and generating nodes. Syntax checker only checks syntax, and used only for skipping nested function bodies. That is what I tried to explain. They have lot in common, but they have differences as well. > There are semantic checks that we will want to do while parsing: > the various "features" we track > variable and function accumulation > partial AST generation That is what my grammar do :) It is "binary compatible" with the bison parser. The only thing which is not tracked is the "arguments feature" because I don't really understand its purpose yet :( > New things we _must_ be able to do in the new parser: > strict mode identification needs to be done during the parse stage and > alters the state of the parser to reject constructs that lax mode would allow, > and produces parse time errors instead). For efficiency reasons this basically > forces us into a recursive-descent parser :-/ I saw the new ECMA standard, and I think I can hanle the strict mode restrictions, at least those which I saw (local identifiers cannot be used before a var declaration seems a more complex one, since you have to store all the primary identifiers during parsing). I checked wikipedia, and turned out my parser is a predictive parser (uses a stack, but never backtracks). The predictive parser belongs to the family of recursive descent parsers.

Oliver Hunt

Comment 5 2010-01-22 15:05:31 PST

(In reply to comment #4) > (In reply to comment #3) > > I'm unsure why the syntax checker is separate from the parser. Syntax checking > > is basically answering the question "can i parse this". > > The grammar do both syntax checking and generating nodes. Syntax checker only > checks syntax, and used only for skipping nested function bodies. That is what > I tried to explain. They have lot in common, but they have differences as well. Basically the parser should be set up with a flag along the lines of parse(int maxFunctionASTDepth) { ... } then the various functions to create nodes should basically be if (currentFunctionDepth < maxFunctionDepth) newNode = new ...; Having the ability to dynamically vary how deep we generate the tree will be beneficial for certain real world cases. > > > There are semantic checks that we will want to do while parsing: > > the various "features" we track > > variable and function accumulation > > partial AST generation > > That is what my grammar do :) It is "binary compatible" with the bison parser. > The only thing which is not tracked is the "arguments feature" because I don't > really understand its purpose yet :( Whether a function ever references a variable with the name "arguments" -- we need to know whether the arguments object may have been used. > > > New things we _must_ be able to do in the new parser: > > strict mode identification needs to be done during the parse stage and > > alters the state of the parser to reject constructs that lax mode would allow, > > and produces parse time errors instead). For efficiency reasons this basically > > forces us into a recursive-descent parser :-/ > > I saw the new ECMA standard, and I think I can hanle the strict mode > restrictions, at least those which I saw (local identifiers cannot be used > before a var declaration seems a more complex one, since you have to store all > the primary identifiers during parsing). I checked wikipedia, and turned out my > parser is a predictive parser (uses a stack, but never backtracks). The > predictive parser belongs to the family of recursive descent parsers. Everything should be trivial i think, the parser is gnarly enough to make me sad though.

Zoltan Herczeg

Comment 6 2010-01-22 15:20:03 PST

> Basically the parser should be set up with a flag along the lines of > parse(int maxFunctionASTDepth) { > ... > } > > then the various functions to create nodes should basically be > if (currentFunctionDepth < maxFunctionDepth) > newNode = new ...; > > Having the ability to dynamically vary how deep we generate the tree will be > beneficial for certain real world cases. No problem. I could extend my grammar to go generate nodes for nested functions as well (by default, the function depth counter will be 1, but you can increase it). Ok, just tell me your requirements, I hope my parser-beast can handle anything :)

Oliver Hunt

Comment 7 2010-01-22 15:23:24 PST

(In reply to comment #6) > > Basically the parser should be set up with a flag along the lines of > > parse(int maxFunctionASTDepth) { > > ... > > } > > > > then the various functions to create nodes should basically be > > if (currentFunctionDepth < maxFunctionDepth) > > newNode = new ...; > > > > Having the ability to dynamically vary how deep we generate the tree will be > > beneficial for certain real world cases. > > No problem. I could extend my grammar to go generate nodes for nested functions > as well (by default, the function depth counter will be 1, but you can increase > it). > > Ok, just tell me your requirements, I hope my parser-beast can handle anything > :) Any idea how easy/hard your model would make free variable analysis?

Zoltan Herczeg

Comment 8 2010-01-22 22:12:09 PST

> Any idea how easy/hard your model would make free variable analysis? Agree, it would be good to put some effort analysis here. Features, the new parser should do: - generating nodes for nested functions effort: easy. Actually this feature was in the code, but I removed it the gain some speed. - variable analysis effort: easy. We need some kind of 'set' where we can put the variables (I assume we talk about primary literals here). Because the tokens returned by the lexer analysed in order, we could easly check any dependency requirements between primary literals. - some other strict mode requirements: - with is not supported effort: easy - eval is a reserved word effort: easy - arguments is a reserved word nooo arguments again... anyway, probably easy

Zoltan Herczeg

Comment 9 2010-01-25 06:33:10 PST

Created attachment 47344 [details] Improved Dump AST

Zoltan Herczeg

Comment 10 2010-01-25 06:38:48 PST

Created attachment 47345 [details] A new JavaScript Grammar, which passes all jscore regression tests. Compared to the bison parser on SunSpider parser tests, namely jquery, mootools, prototype. The AST is the same. There are some divot differences because regular expression literals are not correctly handled by the bison parser. Line infos for non-statements are useless for bison parser.

Zoltan Herczeg

Comment 11 2010-01-25 06:44:34 PST

mootools parser test shown the following interesting behaviour of the bison parser on the following expression (simplified example): for (var i in x) for (var j in y) ; According to the bison parser, j is declared earlier than i. My predictive parser handles the declaration order correctly.

Zoltan Herczeg

Comment 12 2010-01-25 14:06:14 PST

I am thinking about the future of this work. I will do layout and performance testings tommorrow, but still meany other tasks ahead. Would be good to broke this patch to smaller pieces, but it doesn't seem possible, since everything is needed. I cannot cut out things, like 'switch', 'for' or 'function' parsing. Since this is a huge piece of code, probably many things will be needed to improve, like changing variable names, restructure code. This will need a lot of reviewer effort, which I thank for in advance. I planned this parser to be robust and future proof. It seems these advantages are put in good use sooner than I expected. Such things are strict JS parsing or generating AST for nested function bodies. However, it would be better to do them in separate patches, than doing everything in an all in-one package. Probably would be better to start these works after I got some feedback about the code. I have one question about function nesting. We use ParserArenaData<DeclarationStacks::...> variables in the code, but the FunctionBodyNode expects DeclarationStacks::... variables. What would be a good solution for this difference? Anyway, although the code is big, the concept behind it is very simple. A step-by-step debugging on simple JS sources can reaveal easly how it works.

Zoltan Herczeg

Comment 13 2010-01-26 03:19:08 PST

Created attachment 47396 [details] Selectable JavaScript grammar The patch is reworked. Now, it does not remove the original bison parser. The new parser is compiler selectable (ENABLE_PREDICTIVE_PARSER). It is disabled by default.

Zoltan Herczeg

Comment 14 2010-01-26 03:22:29 PST

New parser runtime results --suite parse-only: TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: 2.19x as fast 69.5ms +/- 1.1% 31.8ms +/- 0.9% significant ============================================================================= jquery: 2.20x as fast 10.8ms +/- 2.8% 4.9ms +/- 4.6% significant 1.3.2: 2.20x as fast 10.8ms +/- 2.8% 4.9ms +/- 4.6% significant mootools: 2.36x as fast 11.8ms +/- 2.6% 5.0ms +/- 0.0% significant 1.2.2-core-nc: 2.36x as fast 11.8ms +/- 2.6% 5.0ms +/- 0.0% significant prototype: 2.12x as fast 12.5ms +/- 3.0% 5.9ms +/- 3.8% significant 1.6.0.3: 2.12x as fast 12.5ms +/- 3.0% 5.9ms +/- 3.8% significant concat: 2.15x as fast 34.4ms +/- 1.1% 16.0ms +/- 0.0% significant jquery-mootools-prototype: 2.15x as fast 34.4ms +/- 1.1% 16.0ms +/- 0.0% significant No observable differences on other benchmarks (SunSpider, V8, WindScorpion)

WebKit Review Bot

Comment 15 2010-01-26 03:27:14 PST

Attachment 47396 [details] did not pass style-queue: Failed to run "WebKitTools/Scripts/check-webkit-style" exit_code: 1 JavaScriptCore/parser/PredictiveGrammar.h:39: Code inside a namespace should not be indented. [whitespace/indent] [4] JavaScriptCore/parser/PredictiveGrammarConsts.h:90: Code inside a namespace should not be indented. [whitespace/indent] [4] JavaScriptCore/parser/SyntaxChecker.h:36: Alphabetical sorting problem. [build/include_order] [4] JavaScriptCore/parser/SyntaxChecker.h:46: Code inside a namespace should not be indented. [whitespace/indent] [4] JavaScriptCore/parser/Lexer.h:36: Code inside a namespace should not be indented. [whitespace/indent] [4] JavaScriptCore/parser/Lexer.h:103: This { should be at the end of the previous line [whitespace/braces] [4] JavaScriptCore/parser/Lexer.h:109: This { should be at the end of the previous line [whitespace/braces] [4] Total errors found: 7 If any of these errors are false positives, please file a bug against check-webkit-style.

Zoltan Herczeg

Comment 16 2010-01-26 06:40:30 PST

Created attachment 47407 [details] Selectable JavaScript grammar (+mac build support)

WebKit Review Bot

Comment 17 2010-01-26 06:47:17 PST

Attachment 47407 [details] did not pass style-queue: Failed to run "WebKitTools/Scripts/check-webkit-style" exit_code: 1 JavaScriptCore/parser/PredictiveGrammar.h:39: Code inside a namespace should not be indented. [whitespace/indent] [4] JavaScriptCore/parser/PredictiveGrammarConsts.h:90: Code inside a namespace should not be indented. [whitespace/indent] [4] JavaScriptCore/parser/SyntaxChecker.h:46: Code inside a namespace should not be indented. [whitespace/indent] [4] JavaScriptCore/parser/Lexer.h:36: Code inside a namespace should not be indented. [whitespace/indent] [4] Total errors found: 4 If any of these errors are false positives, please file a bug against check-webkit-style.

Oliver Hunt

Comment 18 2010-01-26 14:35:51 PST

Comment on attachment 47407 [details] Selectable JavaScript grammar (+mac build support) I don't think there's any reason to still use cumulative "feature" flags. What you should do is have a function stack (very much pseudo code): struct FunctionInformation { Features features; Vector<VariableDeclaration> variables; ... etc ... } Vector<FunctionInformation> functions; Then when the parser starts a new function it should simply do: functions.append(FunctionInformation); And all the various constructs that add features become: functions.last().features |= NewFeatures; case VarDeclaration: ... functions.last().variables.add(theNewVariableDeclaration); ... etc (so we lose all the various array and feature unification rules) The only reason we use cumulative feature flags currently is because yacc/bison generate bottom up parsers which by definition can't do this (the parser doesn't "know" it's producing a function until it has already parsed the child statemetns) In terms of style rather than the current top(-n) = new Node(top(), top(-1), ..., top(-n)) i would prefer push(new Node(pop(), pop(), ...) Which i think is much nicer.

Zoltan Herczeg

Comment 19 2010-01-26 15:21:54 PST

(In reply to comment #18) > (From update of attachment 47407 [details]) > I don't think there's any reason to still use cumulative "feature" flags. I would prefer a scope flag better myself, but many nodes actually uses these features as arguments, like BinaryOpNode (and its derived classes) has rightHasAssignments (example: a + (b * c = d) -> AddNode rightHasAssignments must be true), or TryNode catchHasEval, BracketAccessorNode (and similar nodes) subscriptHasAssignments, etc. It would be really hard to decide these flags without local features. Actually I don't know how these flags modify the behaviour of emitByteCode, but if they are unnecessary, I could remove all of them :) > What you should do is have a function stack (very much pseudo code): > struct FunctionInformation { > Features features; > Vector<VariableDeclaration> variables; > ... etc ... > } > Vector<FunctionInformation> functions; I have exactly the same idea about function stack, except a minor tweak. The topmost FunctionInformation is stored by the Grammar itself, and pushed to the stack when a nested function is parsed. It is a little faster than an array access. > case VarDeclaration: > ... > functions.last().variables.add(theNewVariableDeclaration); > ... > > etc > > (so we lose all the various array and feature unification rules) > > In terms of style rather than the current top(-n) = new Node(top(), top(-1), > ..., top(-n)) i would prefer > push(new Node(pop(), pop(), ...) > > Which i think is much nicer. Wow, you are right. I didn't think about it before. Actually this is possible, since the items are not freed on the top of the stack, and the values can be accessed before a push reuse the entry. I hope the compiler won't mess up multiple pop()-s.

Guillaume Chereau

Comment 20 2010-01-26 23:39:43 PST

Hello all, I tried the new parser on the javascript parsing test from Holger's benchmarks (http://gitorious.org/qtwebkit/performance), and I get the following results on my laptop : bison parser : 192 ms new parser : 115 ms (it is ~ 1.6 time better) The test consist of parsing a set of javascript codes taken from a list of popular websites. However I couldn't test the whole set because of an error when parsing lines of this form : "new f(x++)" I get : ASSERTION FAILED: m_unaryMode == PrimaryMode || m_unaryMode == PrimaryDotMode || m_unaryMode == PrimaryArgumentMode (../../../JavaScriptCore/parser/PredictiveGrammar.cpp:273 void JSC::Grammar::primaryExpressionToNode()) Segmentation fault

Zoltan Herczeg

Comment 21 2010-01-27 06:12:29 PST

Created attachment 47525 [details] Next attempt

WebKit Review Bot

Comment 22 2010-01-27 06:13:49 PST

Attachment 47525 [details] did not pass style-queue: Failed to run "WebKitTools/Scripts/check-webkit-style" exit_code: 1 JavaScriptCore/parser/PredictiveGrammar.h:39: Code inside a namespace should not be indented. [whitespace/indent] [4] JavaScriptCore/parser/PredictiveGrammarConsts.h:90: Code inside a namespace should not be indented. [whitespace/indent] [4] JavaScriptCore/parser/SyntaxChecker.h:46: Code inside a namespace should not be indented. [whitespace/indent] [4] JavaScriptCore/parser/Lexer.h:36: Code inside a namespace should not be indented. [whitespace/indent] [4] JavaScriptCore/parser/PredictiveGrammar.cpp:2699: Should have a space between // and comment [whitespace/comments] [4] Total errors found: 5 If any of these errors are false positives, please file a bug against check-webkit-style.

Zoltan Herczeg

Comment 23 2010-01-27 06:27:34 PST

Thank you for the feedback! > (it is ~ 1.6 time better) great! I hope that was in release mode :) > "new f(x++)" Thanks. I also found some bugs myself, and the fast/js layout tests also shown me some others, like: (1++).x; a(x++).b; for (new a in c) { }, and "a + b = c" should fail on the syntax checker as well. All of them are fixed in the newest patch. The constructor of the parser extended to: Grammar(JSGlobalData* globalData, int maxFunctionDepth = 1, bool strict = false) I started to work on maxFunctionDepth first, and it is supported now. Tested both on small examples and all JSCore tests are passed with maxFunctionDepth = 3 . The 'strict' is just added for future use. I really want to implement the strict mode checks in a different patch :) There are only two fast/js tests which fail: fast/js/reparsing-semicolon-insertion.html - the comment says: // According to the ECMA spec, these should all be syntax errors. However, the // pre-existing behaviour of JavaScriptCore has always been to accept them. fast/js/function-declaration-statement.html - crashes. No idea what is happening here (yet), since the JS code snippets are parsed correctly by the command line tool.

Oliver Hunt

Comment 24 2010-01-27 15:54:36 PST

Comment on attachment 47525 [details] Next attempt I don't like the case '-': .... case '{': ... etc idiom. You should stick to the symbolic names like OPENBRACE -- this will also reduce the lexer diffs, and reduce the risk of accidentally introducing a symbolic name that overloads a character. We also need to get the full set of tests to parse correctly. Then we need to fuzz this (we don't have many -- if any -- invalid script tests alas) --Oliver

Zoltan Herczeg

Comment 25 2010-01-28 06:29:29 PST

Created attachment 47616 [details] Layout test for parser syntax checking

Zoltan Herczeg

Comment 26 2010-01-28 06:41:33 PST

(In reply to comment #25) > Created an attachment (id=47616) [details] > Layout test for parser syntax checking This test contains syntax checking examples. Both parsers can handle these test cases. However, I noticed something for the bison parser: function f() { return 6 + } is accepted, altough return 6 + is not.

Zoltan Herczeg

Comment 27 2010-01-29 06:14:04 PST

Created attachment 47711 [details] Another attempt

Zoltan Herczeg

Comment 28 2010-01-29 06:15:11 PST

Created attachment 47712 [details] Layout test for parser syntax checking (patch 2) Adding more tests.

WebKit Review Bot

Comment 29 2010-01-29 06:17:37 PST

Attachment 47711 [details] did not pass style-queue: Failed to run "WebKitTools/Scripts/check-webkit-style" exit_code: 1 JavaScriptCore/parser/PredictiveGrammar.h:39: Code inside a namespace should not be indented. [whitespace/indent] [4] JavaScriptCore/parser/PredictiveGrammarConsts.h:90: Code inside a namespace should not be indented. [whitespace/indent] [4] JavaScriptCore/parser/SyntaxChecker.h:46: Code inside a namespace should not be indented. [whitespace/indent] [4] JavaScriptCore/parser/Lexer.h:36: Code inside a namespace should not be indented. [whitespace/indent] [4] Total errors found: 4 If any of these errors are false positives, please file a bug against check-webkit-style.

Zoltan Herczeg

Comment 30 2010-01-29 06:28:48 PST

(In reply to comment #24) > (From update of attachment 47525 [details]) > I don't like the > case '-': .... > case '{': ... > etc Done. I also found another two errors in my parser: "for (a,b in c ; ;) {}" accepted, and "if (a) function f() {}" caused segfault. Both of them fixed (these were minor bugs, only 1 line change was required). Now all jscore and layout tests are passed successfully. Finally :) > We also need to get the full set of tests to parse correctly. Then we need to > fuzz this (we don't have many -- if any -- invalid script tests alas) I have attached a new patch here. It contains about 250 strings, half of them valid JavaScript source code, and the other half is not. The strings were tested as plain stings, and a "function f() {}" function declaration is added around them. This is useful, since for example "return 6+" fails with syntax error in the bison test, while "function f() { return 6+ }" is not. I hope the patch is not far from final. Hard to update this amount of code again and again :) When it reaches nearly finished state, I will add a ChangeLog entry and update the mac project rules.

Zoltan Herczeg

Comment 31 2010-01-29 06:51:54 PST

TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: 2.16x as fast 69.5ms +/- 1.3% 32.2ms +/- 0.9% significant ============================================================================= jquery: 2.14x as fast 10.7ms +/- 3.2% 5.0ms +/- 0.0% significant 1.3.2: 2.14x as fast 10.7ms +/- 3.2% 5.0ms +/- 0.0% significant mootools: 2.26x as fast 11.3ms +/- 3.1% 5.0ms +/- 0.0% significant 1.2.2-core-nc: 2.26x as fast 11.3ms +/- 3.1% 5.0ms +/- 0.0% significant prototype: 2.08x as fast 12.7ms +/- 2.7% 6.1ms +/- 3.7% significant 1.6.0.3: 2.08x as fast 12.7ms +/- 2.7% 6.1ms +/- 3.7% significant concat: 2.16x as fast 34.8ms +/- 0.9% 16.1ms +/- 1.4% significant jquery-mootools-prototype: 2.16x as fast 34.8ms +/- 0.9% 16.1ms +/- 1.4% significant Since the code is near final, I have measured the gain again. No surprise here. We have tried it on real web-pages, but unfortunately we experinced about 100% deviation, so the comparison is useless. (I got a message now, that the regressions are passed on mac as well! Excellent news for me) charlie137, would you repeat your measurements again?

Guillaume Chereau

Comment 32 2010-01-29 10:05:31 PST

My previous test was in debug mode, that is why the difference was not as big between the old an new parser. Here are the results (in ms) using the full data set, in debug and release mode for the old and new parser : | mode | old | new | |---------+-----+-----| | debug | 836 | 689 | | release | 204 | 75 | Strangely, with this dataset the difference is not that big in debug mode, but huge in release mode (compiled with -O3).

Oliver Hunt

Comment 33 2010-02-01 12:26:48 PST

Comment on attachment 47711 [details] Another attempt I really don't like the top(-n) stuff. I would much rather all were replaced with pop(), etc. I think if there isn't a performance hit it would be _much_ clearer to have all code be of the form GrammarStackItem rhs = pop(); GrammarStackItem lhs = pop(); push(new AddNode(lhs, rhs)); Or whatever, rather than top(-2) = new AddNode(top(-1), top(-2)); pop();

Zoltan Herczeg

Comment 34 2010-02-03 06:35:37 PST

Created attachment 48022 [details] Next draft Top-less version :D (+25% .cpp code size +0% functionality) JS core and layout tests passed Fortunately, the parsing speed is not affected (based on SS --parse-only tests) This was a more difficult work than I expected. Finally it is finished.

Zoltan Herczeg

Comment 35 2010-02-03 07:04:13 PST

Oliver, what is your opinion about the syntax checker patch? Tasks ahead: * I havent't broken the node generator mechanism, yet. Partial object generation would only be helpful for few, complex nodes, like try, switch, for (and they need to do some consistency check before emitByteCode). Unfortunately, we still have to store the parsing status, which means perhaps we can decrease the required number of stack items to 2 down from 3. * line info, and divot points. It is a mess now, every node do the things differently (depending on how the bison parsing them). We should introduce clear rules and follow them. We can do that now using our predictive parser. * generating parser rules. I have no idea how to do this, since we have so many different language constructions. I couldn't find any real node grouping rules. The two major groups are expressions and statements. Statements are more difficult to implement than expressions. Is there anything else?

Oliver Hunt

Comment 36 2010-02-03 16:17:10 PST

Comment on attachment 48022 [details] Next draft Okay, I'm going to say that the separate syntax checker should go -- there's too much risk of different behaviour between the syntax generator and real parser. If we want pure syntax checking to be faster we just need a call to the parser that says that it shouldn't build an ast at any level

Zoltan Herczeg

Comment 37 2010-02-05 05:20:12 PST

Created attachment 48223 [details] Merging syntax checker and grammar The merge is done, although I am not sure this is a step ahead, since it is slower. *1.028x as slow* 32.6ms +/- 1.1% 33.5ms +/- 2.1% significant I feel the code can be improved (and fast/js/function-dot-arguments.html fails), but first I would like to hear your opinion whether I should continue this work or step back to the previous patch. Besides, is the parser layout test OK?

Zoltan Herczeg

Comment 38 2010-02-05 05:52:04 PST

> *1.028x as slow* 32.6ms +/- 1.1% 33.5ms +/- 2.1% significant Just for the sake of clarity, it is compared to the previous version with a separated checker. Compared to the bison parser, it is still more than 2 times faster. 2.07x as fast 69.5ms +/- 1.3% 33.5ms +/- 2.1%

Darin Adler

Comment 39 2010-02-05 06:53:01 PST

A way to avoid having two copies of the code and yet still have the non-syntax-check version be fast, is to use C++ templates. You can have a boolean template argument decide whether functions are for syntax checking or for compiling, and then avoid adding any branches. I considered mentioning this right when Oliver did his review, but for some reason assumed that performance would not be a problem. If this technique works then we get the best of both worlds. Only one set of source code to maintain, but the full speed we'd otherwise get by having the syntax checker entirely separate.

Zoltan Herczeg

Comment 40 2010-02-05 08:15:31 PST

> A way to avoid having two copies of the code and yet still have the > non-syntax-check version be fast, is to use C++ templates. I like this idea. A difference between the syntax checker and he grammar is not only the simplified code, but also the simplified data structures. Furtunately, the data structures can also be template arguments. My question is which is the best way to do it? What about the following: template<bool syntaxChecker, class StackItem, class MemberVariables>::class Grammar {}; struct GrammarStackItem { ... }; struct SyntaxCheckerStackItem { ... }; struct SyntaxCheckerMemberVariables { ... }; struct GrammarMemberVariables { ... Grammar<true, SyntaxCheckerStackItem, SyntaxCheckerMemberVariables> checker; }; I hope the declaration order will be ok in this way.

Zoltan Herczeg

Comment 41 2010-02-05 21:48:57 PST

Hm, I am not sure all compiler will like to access non-existing members even if the a branch is known to be false. Instead, I could use ALWAYS_INLINE get/set methods, and put ASSERT_NOT_REACHED to those getters/setter which are not exist for the current structure.

Darin Adler

Comment 42 2010-02-07 13:07:12 PST

(In reply to comment #41) > Hm, I am not sure all compiler will like to access non-existing members even if > the a branch is known to be false. It won't. > Instead, I could use ALWAYS_INLINE get/set > methods, and put ASSERT_NOT_REACHED to those getters/setter which are not exist > for the current structure. That might work. Generally the technique is to have functions in a class passed to the template that perform all access to the members that are conditional and then empty functions with the same name in the other class passed to the template.

Oliver Hunt

Comment 43 2010-02-08 12:15:55 PST

Comment on attachment 47712 [details] Layout test for parser syntax checking (patch 2) r=me on the testcase -- there's no reason not to have the test in the repository

Eric Seidel (no email)

Comment 44 2010-02-08 15:18:08 PST

Attachment 47712 [details] was posted by a committer and has review+, assigning to Zoltan Herczeg for commit.

Zoltan Herczeg

Comment 45 2010-02-09 01:35:06 PST

Comment on attachment 47712 [details] Layout test for parser syntax checking (patch 2) Landed in r54530

Zoltan Herczeg

Comment 46 2010-02-09 06:40:22 PST

Created attachment 48409 [details] template based approach Finally finished. Actually, this was the hardest work so far. Darin's prediction was right, the speed of this approach is nearly as fast as the separate grammar syntax checker approach (32.6ms up from 32.2ms). Besides, the code is nearly as big :) Regardless, I think this is the best choice between speed and maintainability. I have a small design question. I know C++ inlines functions inside a class definition, but still, I like to put an ALWAYS_INLINE macro before the definition to stress that I want it to be an inline function. This information can be useful for those, who want to change the code. What is WebKit's policy here? (I couldn't find anything about it in the coding guidelines)

Alexey Proskuryakov

Comment 47 2010-02-09 08:17:29 PST

We usually add ALWAYS_INLINE when profiling suggests measurable benefit from it.

Darin Adler

Comment 48 2010-02-09 10:46:25 PST

Comment on attachment 48409 [details] template based approach The pbxproj part of this patch is failing to apply, so the trybots are unable to evaluate it.

Darin Adler

Comment 49 2010-02-09 10:47:33 PST

(In reply to comment #47) > We usually add ALWAYS_INLINE when profiling suggests measurable benefit from > it. Yes, that's the guideline. Don't use it unless it's needed to improve performance, and feel free to use it if it does improve performance. Same for NEVER_INLINE.

Zoltan Herczeg

Comment 50 2010-02-09 11:47:14 PST

(In reply to comment #48) > (From update of attachment 48409 [details]) > The pbxproj part of this patch is failing to apply, so the trybots are unable > to evaluate it. I haven't updated my project for more than a week, and I don't intend to do it until major changes required, since I cannot compare the performance gains or losses anymore. When you say the general direction is ok, and only smaller reworkings are required, I will update the project of course. Each of the last 3 patch changed at least 50% of the codebase, and I wanted to make sure I don't mess up something :) Thank you for your advices about ALWAYS_INLINE. I prefer all 1 line getter setters to be inline functions.

Darin Adler

Comment 51 2010-02-09 11:59:36 PST

(In reply to comment #50) > Thank you for your advices about ALWAYS_INLINE. I prefer all 1 line getter > setters to be inline functions. Yes. But please not that we only use ALWAYS_INLINE if there’s some reason they are not getting inlined. We don’t use that macro to express preferences.

Darin Adler

Comment 52 2010-02-09 12:01:18 PST

The major problem remaining here is that this new parser may be too difficult to modify when we make changes to the language or fix bugs. I believe Oliver is investigating generating this parser from some sort of grammar, which seems like a way to get a parser that works just like this, but remain flexible.

Geoffrey Garen

Comment 53 2010-02-09 14:52:26 PST

These comments probably would have been more useful earlier in your development process. Sorry about that. I wish we knew exactly what was slow about our YACC parser. I hope we're not just falling prey to Second System Syndrome here, and throwing out working code that could be optimized. Do you have any data on where the YACC parser spends its time? If I had to guess, I would guess that the main optimization in this rewrite, as measured by SunSpider, was to do less work in the syntax checking case. But there seems to be low-hanging fruit in the existing YACC grammar for doing those same optimizations. Did you consider that option? If so, are there significant barriers to it? I found the code in this patch a bit hard to follow, and I'm worried that it might be difficult to modify this parser in the future. Here are a few things that could make this patch clearer: * I don't think there's any need for the ENABLE(PREDICTIVE_PARSER) flag. Having two parsers just makes the code harder to follow. I think this patch can just remove the old parser -- that's its intent, after all. * Why does the parser keep its own m_stackPointer instead of just using Vector::size and Vector::last? * It's a surprising idiom to grow the stack, and then modify the top-most item. I would recommend creating a stack entry and pushing onto the top of the stack instead. * The two different copies of the *StackItem and * LocalMembers classes make the structure of the stack hard to follow. Did you provide two copies of these classes just to support ASSERT_NOT_REACHED calls? If so, is there another way to accomplish what those assertions try to accomplish?

Geoffrey Garen

Comment 54 2010-02-09 15:15:57 PST

I'm attaching an example of the kind of analysis you could do of YACC (jquery-yacc.zip). I set YYDEBUG to 1 and ran the jquery parsing test. It looks like the most common state the parser was in was 607, which is a choice between shifting '.', '(', or '[', and reducing MemberExpr_NoNode or NewExpr_NoNode.

Geoffrey Garen

Comment 55 2010-02-09 15:16:52 PST

Created attachment 48444 [details] jquery-yacc.zip

Zoltan Herczeg

Comment 56 2010-02-10 00:57:23 PST

Thank you for your comments > I wish we knew exactly what was slow about our YACC parser. I hope we're not > just falling prey to Second System Syndrome here, and throwing out working code > that could be optimized. Do you have any data on where the YACC parser spends > its time? This is only half of the story. The real issue here is not the present, but the future of JavaScript. The strict mode defined by ECMA requires an LL parser. This example was mentioned before: for (var a in b) for (var c in a) { } According to the bison parser, 'c' is defined first, and 'a' later, which is false. This is a nature of a bottom-up parser (right recursive LALR grammar), and probably cannot be helped. In strict mode, the "var id" definition must precede the first use of "id". The above statement is valid in strict mode, since "var a" is defined before it is used. However, this check cannot be done during the parsing in a bison parser. Post parsing checks takes extra time, and it would be really hard for nested function bodies, since no AST is generated for them. That means we need an LL (left recursive) grammar to do these strict mode checks in parsing time. Unfortunately, JavaScript is not left recursive which makes things difficult. Oliver tries to find a solution, which would be basically an automatically generated LL parser, which can enter to right recursive mode, when it is necessary. The task is really challenging, but if it was successful (the resulting parser is much faster than bison) it would be worth some nice research papers on some big conferences (or even more). Anyway, your idea is good, profiling can help for further parser optimizations. > If I had to guess, I would guess that the main optimization in this rewrite, as > measured by SunSpider, was to do less work in the syntax checking case. But > there seems to be low-hanging fruit in the existing YACC grammar for doing > those same optimizations. Did you consider that option? If so, are there > significant barriers to it? Actually, no. The node generator is 3 times faster (in pure speed, excluding everything else, including the lexer). The syntax checking is a little better, it is 4 times faster. Besides, do you think it would be worth to cut the bison grammar into two files, one for NoNode rules, and one for everything else? > I found the code in this patch a bit hard to follow, and I'm worried that it > might be difficult to modify this parser in the future. Here are a few things > that could make this patch clearer: > > * I don't think there's any need for the ENABLE(PREDICTIVE_PARSER) flag. Having > two parsers just makes the code harder to follow. I think this patch can just > remove the old parser -- that's its intent, after all. This is the long term plan, but we should give some time for all ports to prepare for the change. When it is enabled on all ports, we can simply remove the grammar.y In my first patch, I removed the grammar.y, but others conviced me not to do it now. > * Why does the parser keep its own m_stackPointer instead of just using > Vector::size and Vector::last? Because the parser stack is ever growing during parsing. Actually the members of a stack item must be accessible after it is popped from the stack. An automated shrink could cause a segmentation fault. > * It's a surprising idiom to grow the stack, and then modify the top-most item. > I would recommend creating a stack entry and pushing onto the top of the stack > instead. The stack item is really complex, and usually only some members are actually used. We can gain some speed here by not copying unnecessary blocks. Anyway, I think I could change this behaviour if you wish, I don't think the gain would be that significant. > * The two different copies of the *StackItem and * LocalMembers classes make > the structure of the stack hard to follow. Did you provide two copies of these > classes just to support ASSERT_NOT_REACHED calls? If so, is there another way > to accomplish what those assertions try to accomplish? StackItem can be SyntaxCheckerStackItem or GrammarStackItem, and LocalMembers can be SyntaxCheckerLocalMembers or GrammarLocalMembers. Naturally, at least one (sometimes both) of these implementations do not contain an ASSERT_NOT_REACHED for a getter or setter. We need these inline function calls to support templates.

Zoltan Herczeg

Comment 57 2010-02-12 01:05:51 PST

Created attachment 48631 [details] Updated patch Updated patch. Actually ALWAYS_INLINE really improves the performance for getter/setters, so I decided to keep them.

Zoltan Herczeg

Comment 58 2010-02-12 01:08:45 PST

Created attachment 48632 [details] profiling data profiling - grpof "cat concat-jquery-mootools-prototype.js" 16 times coverage - gcov on concat-jquery-mootools-prototype.js interesting. really

Zoltan Herczeg

Comment 59 2010-02-12 05:02:07 PST

Created attachment 48638 [details] profiling data (maxFunctionDepth = 32) The previous profile data was mainly syntax checking, since the nested functions are never called (nodes are not generated for them). However, our shiny custom parser supports deeper AST generation. I set maxFunctionDepth to 32, which depth is probably enough for the parsing tests, and repeated the measurements.

Eric Seidel (no email)

Comment 60 2010-02-17 15:29:17 PST

I'm confused. Is this bug intentionally marked as a security bug?

Oliver Hunt

Comment 61 2010-02-17 15:36:30 PST

(In reply to comment #60) > I'm confused. Is this bug intentionally marked as a security bug? I don't believe so, if i don't see anything to the contrary in the next few hours i'll drop the flag.

Zoltan Herczeg

Comment 62 2010-02-19 13:30:50 PST

I was informed this work has no future, so I think this bug should be closed. I plan to write a blog post about summarizing the results, and would be good to link the patches here. Could you drop the security flag? (Could someone give me the right to set the WONTFIX flag in the bugzilla?)

Eric Seidel (no email)

Comment 63 2010-02-19 13:34:16 PST

I've fixed your bug permissions. Dropping the security flag per Oliver's comment above.

Oliver Hunt

Comment 64 2010-02-19 13:38:14 PST

(In reply to comment #62) > I was informed this work has no future, so I think this bug should be closed. I > plan to write a blog post about summarizing the results, and would be good to > link the patches here. Could you drop the security flag? (Could someone give me > the right to set the WONTFIX flag in the bugzilla?) I'm not sure we want to close this bug -- maybe we want to retitle this bug, something like "make a decent parser in the model of zoltan's awesome one" ?

Darin Adler

Comment 65 2010-02-20 09:37:17 PST

(In reply to comment #62) > I was informed this work has no future I'm surprised to hear that! I thought there was a big feature for this. My hope is we can use this work to create a fast parser, but enhance it further, restoring the feature the bison one has, where you can change the syntax without having to be an expert on the parser internals.

Darin Adler

Comment 66 2010-02-20 09:37:31 PST

(In reply to comment #65) > I thought there was a big feature for this. future

Oliver Hunt

Comment 67 2010-02-20 18:34:53 PST

(In reply to comment #65) > (In reply to comment #62) > > I was informed this work has no future > > I'm surprised to hear that! > In its current form it's simply not maintainable -- either we need to somehow make it something that can be understood, or we cannot reasonably use it -- ES6 is likely to pick up new syntactic forms and it is hard to see how someone could add support for any new syntax in this grammar. The problem is that it is effectively a single large piece of spaghetti and nothing i can think of really resolves this issue. The alternatives (to me) seem to be either writing something that is actually a recursive descent parser, or automatically generating a parser of this form.

Geoffrey Garen

Comment 68 2010-02-22 12:01:47 PST

> The alternatives (to me) seem to be either writing > something that is actually a recursive descent parser, or automatically > generating a parser of this form. Agreed.

Geoffrey Garen

Comment 69 2010-02-22 12:03:59 PST

> This is only half of the story. The real issue here is not the present, but the > future of JavaScript. The strict mode defined by ECMA requires an LL parser. > This example was mentioned before: > > for (var a in b) for (var c in a) { } > > According to the bison parser, 'c' is defined first, and 'a' later, which is > false. This is a nature of a bottom-up parser (right recursive LALR grammar), > and probably cannot be helped. In strict mode, the "var id" definition must > precede the first use of "id". The above statement is valid in strict mode, > since "var a" is defined before it is used. However, this check cannot be done > during the parsing in a bison parser. Post parsing checks takes extra time, and > it would be really hard for nested function bodies, since no AST is generated > for them. Interesting. Thanks for pointing that out. I don't see an obvious good way to solve that problem with a bottom-up parser.

Zoltan Herczeg

Comment 70 2010-03-23 03:43:00 PDT

Created attachment 51410 [details] Version 4.1

Zoltan Herczeg

Comment 71 2010-05-20 23:34:51 PDT

Created attachment 56676 [details] dump AST current version Latest dump AST patch

Zoltan Herczeg

Comment 72 2010-05-21 06:20:54 PDT

Created attachment 56704 [details] latest version of my recursive descent parser Here comes an updated version of our recursive descent JS parser. It's twice as fast as the bison-generated one. Please, have a look. TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: 2.17x as fast 75.4ms +/- 0.9% 34.7ms +/- 1.4% significant ============================================================================= jquery: 2.29x as fast 11.9ms +/- 1.9% 5.2ms +/- 5.8% significant 1.3.2: 2.29x as fast 11.9ms +/- 1.9% 5.2ms +/- 5.8% significant mootools: 2.10x as fast 12.4ms +/- 3.0% 5.9ms +/- 3.8% significant 1.2.2-core-nc: 2.10x as fast 12.4ms +/- 3.0% 5.9ms +/- 3.8% significant prototype: 2.12x as fast 13.8ms +/- 2.2% 6.5ms +/- 5.8% significant 1.6.0.3: 2.12x as fast 13.8ms +/- 2.2% 6.5ms +/- 5.8% significant concat: 2.18x as fast 37.3ms +/- 0.9% 17.1ms +/- 1.3% significant jquery-mootools-prototype: 2.18x as fast 37.3ms +/- 0.9% 17.1ms +/- 1.3% significant (Oliver mentioned on IRC that there are plans for writing a recursive descent parser. Why to implement another one if here is an already working solution?)

Oliver Hunt

Comment 73 2010-05-21 08:05:46 PDT

(In reply to comment #72) > Created an attachment (id=56704) [details] > latest version of my recursive descent parser > > Here comes an updated version of our recursive descent JS parser. It's twice as fast as the bison-generated one. Please, have a look. > > TEST COMPARISON FROM TO DETAILS > > ============================================================================= > > ** TOTAL **: 2.17x as fast 75.4ms +/- 0.9% 34.7ms +/- 1.4% significant > > ============================================================================= > > jquery: 2.29x as fast 11.9ms +/- 1.9% 5.2ms +/- 5.8% significant > 1.3.2: 2.29x as fast 11.9ms +/- 1.9% 5.2ms +/- 5.8% significant > > mootools: 2.10x as fast 12.4ms +/- 3.0% 5.9ms +/- 3.8% significant > 1.2.2-core-nc: 2.10x as fast 12.4ms +/- 3.0% 5.9ms +/- 3.8% significant > > prototype: 2.12x as fast 13.8ms +/- 2.2% 6.5ms +/- 5.8% significant > 1.6.0.3: 2.12x as fast 13.8ms +/- 2.2% 6.5ms +/- 5.8% significant > > concat: 2.18x as fast 37.3ms +/- 0.9% 17.1ms +/- 1.3% significant > jquery-mootools-prototype: 2.18x as fast 37.3ms +/- 0.9% 17.1ms +/- 1.3% significant > > (Oliver mentioned on IRC that there are plans for writing a recursive descent parser. Why to implement another one if here is an already working solution?) I just find this parser very difficult to follow -- it's possible my parser is insufficiently complete to show the complexity present in this one (my parser is still very very early in its development) but I honestly have difficulty following the code here. Once i have more of the grammar hooked up in my parser (i have yet to get a complete parser for the expression tree written) i'll look at the complexity

Akos Kiss

Comment 74 2010-05-21 10:11:03 PDT

(In reply to comment #73) > > (Oliver mentioned on IRC that there are plans for writing a recursive descent parser. Why to implement another one if here is an already working solution?) > > I just find this parser very difficult to follow -- it's possible my parser is insufficiently complete to show the complexity present in this one (my parser is still very very early in its development) but I honestly have difficulty following the code here. Once i have more of the grammar hooked up in my parser (i have yet to get a complete parser for the expression tree written) i'll look at the complexity Ohh, is this serious? A whole parser will be reimplemented (how long will it take?), just because the current one (which is ready for use, and which is fast!) is complex? I have a deja vu... It seems to me that we played the same game before. We've been working on the ARM JIT and got our solutions rejected again and again, just to find out later on that someone at Apple is already working on something similar. Long live the open source mentality... :( P.S.: We are happy to maintain the code we write. We have been around for 2 years, but we are still strugling to get us accepted as serious contributors...

Oliver Hunt

Comment 75 2010-05-21 10:25:38 PDT

(In reply to comment #74) > (In reply to comment #73) > > > (Oliver mentioned on IRC that there are plans for writing a recursive descent parser. Why to implement another one if here is an already working solution?) > > > > I just find this parser very difficult to follow -- it's possible my parser is insufficiently complete to show the complexity present in this one (my parser is still very very early in its development) but I honestly have difficulty following the code here. Once i have more of the grammar hooked up in my parser (i have yet to get a complete parser for the expression tree written) i'll look at the complexity > > Ohh, is this serious? A whole parser will be reimplemented (how long will it take?), just because the current one (which is ready for use, and which is fast!) is complex? No idea, i just need to get enough implemented to get an idea as to whether the complexity of this parser is reasonable. We always attempt to avoid unnecessarily complex code, and this parser seems very strangely constructed. > > I have a deja vu... It seems to me that we played the same game before. We've been working on the ARM JIT and got our solutions rejected again and again, just to find out later on that someone at Apple is already working on something similar. In the case of ARM we were working on a thumb2 version and i believe the discussions between you guys and gavin led to what was a better design overall. Anyhoo, this situation is entirely different -- I started my recursive descent parser only recently, i've only spent around 3 hours on it so far so it can't even parse an entire expression tree yet. I also told zoltan the first time i saw him on irc again. > > Long live the open source mentality... :( > > P.S.: We are happy to maintain the code we write. We have been around for 2 years, but we are still strugling to get us accepted as serious contributors... We do consider you guys to be serious contributors, we frequently talk to zoltan and ossy on irc, however we really don't like big complex code chunks landing unless we're sure that the complexity is necessary -- this applies to everyone, including other people from Apple.

Akos Kiss

Comment 76 2010-05-22 02:11:31 PDT

(In reply to comment #75) > > I have a deja vu... It seems to me that we played the same game before. We've been working on the ARM JIT and got our solutions rejected again and again, just to find out later on that someone at Apple is already working on something similar. > > In the case of ARM we were working on a thumb2 version and i believe the discussions between you guys and gavin led to what was a better design overall. Let's say, it's a different design. (We are still investigating what we might have lost - in terms of performance - by conforming to the macro assembler.) > > P.S.: We are happy to maintain the code we write. We have been around for 2 years, but we are still strugling to get us accepted as serious contributors... > > We do consider you guys to be serious contributors, we frequently talk to zoltan and ossy on irc, however we really don't like big complex code chunks landing unless we're sure that the complexity is necessary -- this applies to everyone, including other people from Apple. Well, I've never seen before a feature implemented twice just to check it's complexity. (Although, I can imagine other reasons for the reimplementation...) Anyway, this is not the right forum to discuss this. And I don't want to start a flame war either. Perhaps we can have a talk about this next week in person...

Oliver Hunt

Comment 77 2010-05-22 03:07:26 PDT

(In reply to comment #76) > (In reply to comment #75) > > > I have a deja vu... It seems to me that we played the same game before. We've been working on the ARM JIT and got our solutions rejected again and again, just to find out later on that someone at Apple is already working on something similar. > > > > In the case of ARM we were working on a thumb2 version and i believe the discussions between you guys and gavin led to what was a better design overall. > > Let's say, it's a different design. (We are still investigating what we might have lost - in terms of performance - by conforming to the macro assembler.) When accounting for performance you need to consider all the optimisations we have made that would need to be duplicated if you weren't using the macro assembler. > > > > P.S.: We are happy to maintain the code we write. We have been around for 2 years, but we are still strugling to get us accepted as serious contributors... > > > > We do consider you guys to be serious contributors, we frequently talk to zoltan and ossy on irc, however we really don't like big complex code chunks landing unless we're sure that the complexity is necessary -- this applies to everyone, including other people from Apple. > > Well, I've never seen before a feature implemented twice just to check it's complexity. (Although, I can imagine other reasons for the reimplementation...) I feel that the implementation attached to this patch is unnecessarily complicated, and as a project we try not to take code that is more complex than necessary. The problem with complex patches is that they are difficult understand, and therefore difficult to improve. This may not seem like a problem, but it means for all we know there could be substantial performance improvements to be made that will be hidden behind the complexity. This is why we always strive for simple code, especially on the initial commit (alas complexity always creeps in over time, but starting off with "complex" just makes it worse). > > Anyway, this is not the right forum to discuss this. And I don't want to start a flame war either. Perhaps we can have a talk about this next week in person... In all honesty, the way to avoid a flame war is to come on irc and actually discuss issues as it allows comments to be elaborated upon and so reduces the likelihood of comments being misconstrued as insulting or passive aggressive, which in is all honesty what your initial comment came across as.

Zoltan Herczeg

Comment 78 2010-05-22 05:06:23 PDT

> I feel that the implementation attached to this patch is unnecessarily complicated, and as a project we try not to take code that is more complex than necessary. This is my fault. As you may guessed, this is not my first parser, and I just throw heavily optimizations which was proven useful before without thinking about others. Probably the same thing would happen, if I have to reimplement JIT for a different project, I would just use experinces gained here to make an effective implementation, and others would not understand since they don't know the steps I skipped. The parser is fast, has small memory footprint (including machine stack), but it is true, not easy to understand. A good documentation could be useful here (and not just here, newcomers always find difficult to understand WebKit) Anyway, the parser is not a time critical project, we should evaluate options before we decide. And share the experiences on irc :)

Zoltan Horvath

Comment 79 2010-05-22 09:14:37 PDT

Hey, this patch only adds code to WebKit - and what it adds is disabled by default - doesn't remove anything. For the present, it doesn't claim additional work from other contributors. I think it doesn't claim such a big thing to add it to the trunk. We can hack on the code complexity, but there are lots of codes in WebKit, which are understood only by a couple of people. I can’t see difference here. If somebody wants to improve this section, then he'll understand it sooner or later. With a good documentation in the trac, this can be easy enough. If it will be in the trunk, people can try it out easily and make their decision about using it. If we will see that the community doesn't like it, there is still a chance to remove it. I know that WebKit is not an experimental project and we don't want to see much experimental features in WebKit, but neither is this parser. I will be glad to see this in the trunk, because I don't see any counter-arguments. Zoltan

Oliver Hunt

Comment 80 2010-05-24 02:11:59 PDT

Created attachment 56863 [details] Recursive descent parser Here's my weekends work -- it's a simple recursive descent parser for JS. It is not complete -- it's missing const, debugger, accessors in object literals, and with. Additionally i haven't yet had a chance to run zoltan's ast dumper which will be necessary as i'm am fairly sure i've screwed up some of the line and character positions for exceptions, etc It also doesn't have any real stack guards -- there's a stack checker there at the moment but i'm basically just using that to judge stack usage so it simply crashes out if the arbitrary limit is reached. So with those caveats, i figure i should point out the goodness: i believe this is much more understandable and maintainable than the iterative parser -- it is also much less code, but not by as much as the patch sizes implies -- the iterative parser includes a significant amount of comments. It is currently ~5% slower than the iterative parser, but is still much faster than bison -- there may be some trivial improvements that can be made but i'm hesistant to make claims like that as I don't believe Zoltan has done any real perf optimisations to the iterative parser either (is this correct?). Anyhoo, I just figured i'd put this up in response to the "how long will it take?" question. Answer: 1 weekend :-D --Oliver

Zoltan Herczeg

Comment 81 2010-05-24 02:24:07 PDT

Created attachment 56864 [details] latest patch Unfortunately I submitted a wrong patch into bugzilla (version 3 instead of version 4.3).

Zoltan Herczeg

Comment 82 2010-05-24 02:30:27 PDT

> I don't believe Zoltan has done any real perf optimisations to the iterative parser either (is this correct?). True. My parser even mimics bugs and other things bison have. The focus of my patch was compatibility, and make it possible to use heavy optimizations after the switch.

Oliver Hunt

Comment 83 2010-05-24 03:00:50 PDT

(In reply to comment #82) > > I don't believe Zoltan has done any real perf optimisations to the iterative parser either (is this correct?). > > True. My parser even mimics bugs and other things bison have. The focus of my patch was compatibility, and make it possible to use heavy optimizations after the switch. I agree -- there's so much that can be made much better both, but whichever parser we end up using we probably want to basically leave alone for a few weeks to settle in the nightlies before we start making changes to the AST. I'm trapped in meetings from hell for the next two days so i'll probably do some more tidying of my parsing code during them and will post any significant updates. --Oliver

Akos Kiss

Comment 84 2010-05-24 09:33:59 PDT

(In reply to comment #80) > Anyhoo, I just figured i'd put this up in response to the "how long will it take?" question. Answer: 1 weekend :-D Aye, but that's not fair... it's incomplete... ;) -Akos

Oliver Hunt

Comment 85 2010-05-28 23:25:54 PDT

Created attachment 57405 [details] More complete parser Okay this now passes everything except one test in fast/js that is actually incorrect (the automatic semicolon insertion in yacc is incorrect -- i'm unsure what zoltan's patch does for that test). Still has issues (ignoring perf) it uses more stack space than i would like, though it's stack usage seems comparable to spidermonkey + v8 (quite a bit better than ffx for the most part, on par with v8's). The stack guard is not thread safe for no reason other than me being a slacker. I've also realised that the parsing of for-loops is unnecessarily obtuse. Once done with those i'm going to go through and fix up the rest of the squirrelly code and then see what can be done to make it nicer.

Darin Adler

Comment 86 2010-05-29 21:04:47 PDT

Comment on attachment 57405 [details] More complete parser > +#define parseError() do { m_error = true; return 0; } while (0) > +#define failIfFalse(cond) do { if (!(cond)) parseError(); } while(0) > +#define failIfTrue(cond) do { if ((cond)) parseError(); } while(0) > +#define assertMatch(tok) ASSERT(match(tok)) > +#define checkStackLimit() do { failIfFalse(allowRecursion()); } while(0) Why can't these all be inline member functions instead of macros? > +// Macros to make the more common TreeBuilder types a little less verbose > +#define TreeStatement typename TreeBuilder::Statement > +#define TreeExpression typename TreeBuilder::Expression > +#define TreeFormalParameterList typename TreeBuilder::FormalParameterList > +#define TreeSourceElements typename TreeBuilder::SourceElements Why can't these be typedefs instead of macros?

Oliver Hunt

Comment 87 2010-05-29 23:44:06 PDT

(In reply to comment #86) > (From update of attachment 57405 [details]) > > +#define parseError() do { m_error = true; return 0; } while (0) > > +#define failIfFalse(cond) do { if (!(cond)) parseError(); } while(0) > > +#define failIfTrue(cond) do { if ((cond)) parseError(); } while(0) > > +#define assertMatch(tok) ASSERT(match(tok)) > > +#define checkStackLimit() do { failIfFalse(allowRecursion()); } while(0) > > Why can't these all be inline member functions instead of macros? Because they return from the containing function, if they were functions they couldn't do this. > > +// Macros to make the more common TreeBuilder types a little less verbose > > +#define TreeStatement typename TreeBuilder::Statement > > +#define TreeExpression typename TreeBuilder::Expression > > +#define TreeFormalParameterList typename TreeBuilder::FormalParameterList > > +#define TreeSourceElements typename TreeBuilder::SourceElements > > Why can't these be typedefs instead of macros? Because they don't have a single context in which they could be typedef'd -- TreeBuilder is a parameter passed to the parsing functions, not the parser itself.

Oliver Hunt

Comment 88 2010-06-06 13:27:46 PDT

Created attachment 57981 [details] Updated parser This parser is both speedy and complete -- it passes all tests, with the exception of the incorrect semicolon insertion test, and can run in very little stack space -- it currently uses stack guards that match the limit used by spider monkey and v8. That said it has a real problem currently in that lazy and non-lazy parsing use different amounts of stack space, so it's possible to construct examples where a function body can be parsed lazily, but not in the actual "i'd like an AST" pass. I'm unsure how to deal with this. The obvious solution (making it iterative) seem highly likely to lead to the kind of convoluted code I was aiming to avoid in the first case with this patch. Alternatively we could restrict the depth of the lazy parsing more than the full parse, but i am fairly sure that it would still be possible to induce badness even in those cases. Another alternative would be to handle reparsing failures and produce a stack overflow, but we also need to reparse in response to an exception, so if the stack is exhausted when we go to reparse once again we could end up with badness. WWDC week is upon us once more so i may be able to spend some time looking into solutions over the course of the week. I'm wondering if it's possible to separate statement and expression parsing, and have two iterative parsers for less complexity than having a single merged parser. You'd still recurse, but you'd have bound recursion -- basically statement can call the expression parser, but can't recurse into the statement parser, and the expression parser would have to be completely iterative. I'm really not sure. /me shrugs

Oliver Hunt

Comment 89 2010-06-06 13:33:24 PDT

(In reply to comment #88) > Created an attachment (id=57981) [details] > Updated parser Basic changes in this version of the parser is to make assignment chains iterative, and remove the large number of arguments to most of the functions in order to bring a measure of sanity things. It can now get through run-jsc-tests with 4k of stack space in 64bit i think it was.

Zoltan Herczeg

Comment 90 2010-06-17 05:00:08 PDT

Created attachment 58980 [details] faster parser After a long silence, I decided to improve my parser a little because the Qt guys are interested in it: https://lists.webkit.org/pipermail/webkit-qt/2010-June/000632.html This is a great news for me, and would like to thank them again. Practically I reordered some parts of the parser, and improved the switches a little. Seems "prototype" liked it. TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: 1.033x as fast 34.4ms +/- 2.0% 33.3ms +/- 1.8% significant ============================================================================= jquery: - 5.3ms +/- 6.5% 5.2ms +/- 5.8% 1.3.2: - 5.3ms +/- 6.5% 5.2ms +/- 5.8% mootools: ?? 5.3ms +/- 6.5% 5.4ms +/- 6.8% not conclusive: might be *1.019x as slow* 1.2.2-core-nc: ?? 5.3ms +/- 6.5% 5.4ms +/- 6.8% not conclusive: might be *1.019x as slow* prototype: 1.097x as fast 6.8ms +/- 4.4% 6.2ms +/- 4.9% significant 1.6.0.3: 1.097x as fast 6.8ms +/- 4.4% 6.2ms +/- 4.9% significant concat: 1.030x as fast 17.0ms +/- 0.0% 16.5ms +/- 2.3% significant jquery-mootools-prototype: 1.030x as fast 17.0ms +/- 0.0% 16.5ms +/- 2.3% significant

Oliver Hunt

Comment 91 2010-06-17 09:50:33 PDT

(In reply to comment #90) > Created an attachment (id=58980) [details] > faster parser > > After a long silence, I decided to improve my parser a little because the Qt guys are interested in it: > https://lists.webkit.org/pipermail/webkit-qt/2010-June/000632.html > > This is a great news for me, and would like to thank them again. > > Practically I reordered some parts of the parser, and improved the switches a little. Seems "prototype" liked it. The issue with the iterative parser isn't speed -- it was already very fast, the issue is maintainability, the code is very difficult to follow and it's hard to see how that can be improved. I'll attach a slightly improved version of my parser in a bit, but i have been distracted by other stuff recently :-/

Zoltan Herczeg

Comment 92 2010-06-17 23:44:32 PDT

> The issue with the iterative parser isn't speed -- it was already very fast, the issue is maintainability, the code is very difficult to follow and it's hard to see how that can be improved. I think my code is not that bad as the number of "maintability" words imply. It is a state machine, kind of a vertical solution if your code is the horizontal. Anyway, although we have a lot of guys here who is an expert on code quality, I never heard that there is a high correlation between code quality and successful projects.

Geoffrey Garen

Comment 93 2010-06-18 12:30:01 PDT

(In reply to comment #92) > I never heard that there is a high correlation between code quality and successful projects. Code quality is an explicitly state goal of the WebKit project. (See http://webkit.org/projects/goals.html.) If you believe that the WebKit project should change its goals to exclude code quality, please bring the issue up on webkit-dev. For the purposes of this bug, though, I propose working within the WebKit project goals as they currently stand.

Akos Kiss

Comment 94 2010-06-18 12:48:00 PDT

(In reply to comment #93) > (In reply to comment #92) > > I never heard that there is a high correlation between code quality and successful projects. > > Code quality is an explicitly state goal of the WebKit project. (See http://webkit.org/projects/goals.html.) If you believe that the WebKit project should change its goals to exclude code quality, please bring the issue up on webkit-dev. > > For the purposes of this bug, though, I propose working within the WebKit project goals as they currently stand. I do think that it's been an unfortunate comment only. Certainly, we don't want to lower but to increase the quality of the code of WebKit. Where we not necessarily agree is whether Zoltan's version is more _complex_ and/or less _maintainable_ than Oliver's implementation. But hopefully both pathes are of high _quality_. Thus, should any of those be selected for landing, WebKit can only benefit from it.

Zoltan Herczeg

Comment 95 2010-06-18 14:16:30 PDT

> Code quality is an explicitly state goal of the WebKit project. (See http://webkit.org/projects/goals.html.) If you believe that the WebKit project should change its goals to exclude code quality, please bring the issue up on webkit-dev. > > For the purposes of this bug, though, I propose working within the WebKit project goals as they currently stand. Ok, that comment was too much, I know, but keep hearing this "unmaintainable" mantra makes me feel that I have no clue of software developing. I still think that my work is not that bad, and you could help me to improve it.

Oliver Hunt

Comment 96 2010-06-23 18:17:12 PDT

Created attachment 59594 [details] Patch

WebKit Review Bot

Comment 97 2010-06-23 18:23:01 PDT

Attachment 59594 [details] did not pass style-queue: Failed to run "['WebKitTools/Scripts/check-webkit-style', '--no-squash']" exit_code: 1 Last 3072 characters of output: itespace/indent] [4] JavaScriptCore/parser/JSParser.cpp:1054: Place brace on its own line for function definitions. [whitespace/braces] [4] JavaScriptCore/parser/JSParser.cpp:1056: A case label should not be indented, but line up with its switch statement. [whitespace/indent] [4] JavaScriptCore/parser/JSParser.cpp:1157: A case label should not be indented, but line up with its switch statement. [whitespace/indent] [4] JavaScriptCore/parser/JSParser.cpp:1165: More than one command on the same line [whitespace/newline] [4] JavaScriptCore/parser/JSParser.cpp:1208: One space before end of line comments [whitespace/comments] [5] JavaScriptCore/parser/JSParser.cpp:1208: Should have a space between // and comment [whitespace/comments] [4] JavaScriptCore/parser/JSParser.cpp:1267: A case label should not be indented, but line up with its switch statement. [whitespace/indent] [4] JavaScriptCore/parser/JSParser.cpp:1319: One line control clauses should not use braces. [whitespace/braces] [4] JavaScriptCore/parser/JSParser.cpp:1378: A case label should not be indented, but line up with its switch statement. [whitespace/indent] [4] JavaScriptCore/parser/JSParser.cpp:1442: A case label should not be indented, but line up with its switch statement. [whitespace/indent] [4] JavaScriptCore/parser/JSParser.cpp:1463: A case label should not be indented, but line up with its switch statement. [whitespace/indent] [4] JavaScriptCore/parser/ASTBuilder.h:37: Missing space before { [whitespace/braces] [5] JavaScriptCore/parser/ASTBuilder.h:60: Missing space before { [whitespace/braces] [5] JavaScriptCore/parser/ASTBuilder.h:134: More than one command on the same line [whitespace/newline] [4] JavaScriptCore/parser/ASTBuilder.h:140: Place brace on its own line for function definitions. [whitespace/braces] [4] JavaScriptCore/parser/ASTBuilder.h:759: A case label should not be indented, but line up with its switch statement. [whitespace/indent] [4] JavaScriptCore/parser/ASTBuilder.h:846: An else statement can be removed when the prior "if" concludes with a return, break, continue or goto statement. [readability/control_flow] [4] JavaScriptCore/parser/ASTBuilder.h:855: An else statement can be removed when the prior "if" concludes with a return, break, continue or goto statement. [readability/control_flow] [4] JavaScriptCore/parser/Lexer.cpp:29: Alphabetical sorting problem. [build/include_order] [4] JavaScriptCore/parser/JSParser.h:110: first_line is incorrectly named. Don't use underscores in your identifier names. [readability/naming] [4] JavaScriptCore/parser/JSParser.h:111: last_line is incorrectly named. Don't use underscores in your identifier names. [readability/naming] [4] JavaScriptCore/parser/JSParser.h:112: first_column is incorrectly named. Don't use underscores in your identifier names. [readability/naming] [4] JavaScriptCore/parser/JSParser.h:113: last_column is incorrectly named. Don't use underscores in your identifier names. [readability/naming] [4] Total errors found: 38 in 17 files If any of these errors are false positives, please file a bug against check-webkit-style.

Oliver Hunt

Comment 98 2010-06-23 19:11:43 PDT

Created attachment 59603 [details] Patch

WebKit Review Bot

Comment 99 2010-06-23 19:15:14 PDT

Attachment 59603 [details] did not pass style-queue: Failed to run "['WebKitTools/Scripts/check-webkit-style', '--no-squash']" exit_code: 1 JavaScriptCore/parser/SyntaxChecker.h:148: More than one command on the same line [whitespace/newline] [4] JavaScriptCore/parser/SyntaxChecker.h:149: More than one command on the same line [whitespace/newline] [4] JavaScriptCore/parser/SyntaxChecker.h:153: More than one command on the same line [whitespace/newline] [4] JavaScriptCore/parser/JSParser.cpp:35: Alphabetical sorting problem. [build/include_order] [4] JavaScriptCore/parser/JSParser.cpp:36: Alphabetical sorting problem. [build/include_order] [4] JavaScriptCore/parser/Lexer.cpp:29: Alphabetical sorting problem. [build/include_order] [4] JavaScriptCore/parser/JSParser.h:110: first_line is incorrectly named. Don't use underscores in your identifier names. [readability/naming] [4] JavaScriptCore/parser/JSParser.h:111: last_line is incorrectly named. Don't use underscores in your identifier names. [readability/naming] [4] JavaScriptCore/parser/JSParser.h:112: first_column is incorrectly named. Don't use underscores in your identifier names. [readability/naming] [4] JavaScriptCore/parser/JSParser.h:113: last_column is incorrectly named. Don't use underscores in your identifier names. [readability/naming] [4] Total errors found: 10 in 17 files If any of these errors are false positives, please file a bug against check-webkit-style.

Gavin Barraclough

Comment 100 2010-06-23 21:12:19 PDT

Comment on attachment 59603 [details] Patch comparing Identifer == char* in createGetterOrSetterProperty makes me a little sad. bool consume(int expected) This looks a little screwy? if (!result) { m_error = result; is this what you really mean here? - if so, it's probably clearer written as: if (!result) { m_error = false; or is this inverted? - if so, I think you want if (!result) fail(); ? 202 }; 203 int jsParse(JSGlobalData* globalData) pls to be add a newline here. 229 } 230 231 232 bool JSParser::parseProgram() pls to be remove a newline here. ;-D 597 consumeOrFail(CLOSEBRACE); 598 return context.createSwitchStatement(expr, firstClauses, defaultClause, secondClauses, startLine, endLine); 599 600 } and again. #define ENABLE_RECURSIVE_PARSE 1 will this go away completely in your next patch? - I strongly hope so. r+, please please please to be removing the old parser asap.

Oliver Hunt

Comment 101 2010-06-23 21:19:48 PDT

Committed r61732: <http://trac.webkit.org/changeset/61732>

WebKit Review Bot

Comment 102 2010-06-23 21:44:28 PDT

http://trac.webkit.org/changeset/61732 might have broken Qt Linux Release

Eric Seidel (no email)

Comment 103 2010-06-23 23:08:11 PDT

Seems to have borked lots of bots. :(

Csaba Osztrogonác

Comment 104 2010-06-24 06:25:33 PDT

(In reply to comment #103) > Seems to have borked lots of bots. :( Tiger is still dead :(( http://build.webkit.org/builders/Tiger%20Intel%20Release/builds/13588

Eric Seidel (no email)

Comment 105 2010-06-24 14:29:56 PDT

Tiger is still broken after ths change. http://build.webkit.org/builders/Tiger%20Intel%20Release/builds/13588 http://build.webkit.org/builders/Tiger%20Intel%20Release/builds/13587

Attachments
Dump AST (54.65 KB, patch) 2010-01-22 13:41 PST, Zoltan Herczeg	no flags	Details Formatted Diff Diff
A new JavaScript Grammar (draft) (286.47 KB, patch) 2010-01-22 13:42 PST, Zoltan Herczeg	no flags	Details Formatted Diff Diff
Improved Dump AST (55.76 KB, patch) 2010-01-25 06:33 PST, Zoltan Herczeg	no flags	Details Formatted Diff Diff
A new JavaScript Grammar, which passes all jscore regression tests. (300.35 KB, patch) 2010-01-25 06:38 PST, Zoltan Herczeg	no flags	Details Formatted Diff Diff
Selectable JavaScript grammar (178.63 KB, patch) 2010-01-26 03:19 PST, Zoltan Herczeg	no flags	Details Formatted Diff Diff
Selectable JavaScript grammar (+mac build support) (184.80 KB, patch) 2010-01-26 06:40 PST, Zoltan Herczeg	oliver: review-	Details Formatted Diff Diff
Next attempt (191.46 KB, patch) 2010-01-27 06:12 PST, Zoltan Herczeg	oliver: review-	Details Formatted Diff Diff
Layout test for parser syntax checking (36.03 KB, patch) 2010-01-28 06:29 PST, Zoltan Herczeg	no flags	Details Formatted Diff Diff
Another attempt (192.62 KB, patch) 2010-01-29 06:14 PST, Zoltan Herczeg	oliver: review-	Details Formatted Diff Diff
Layout test for parser syntax checking (patch 2) (37.82 KB, patch) 2010-01-29 06:15 PST, Zoltan Herczeg	no flags	Details Formatted Diff Diff
Next draft (211.49 KB, patch) 2010-02-03 06:35 PST, Zoltan Herczeg	oliver: review-	Details Formatted Diff Diff
Merging syntax checker and grammar (177.49 KB, patch) 2010-02-05 05:20 PST, Zoltan Herczeg	no flags	Details Formatted Diff Diff
template based approach (204.85 KB, patch) 2010-02-09 06:40 PST, Zoltan Herczeg	no flags	Details Formatted Diff Diff
jquery-yacc.zip (6.74 MB, application/zip) 2010-02-09 15:16 PST, Geoffrey Garen	no flags	Details
Updated patch (199.42 KB, patch) 2010-02-12 01:05 PST, Zoltan Herczeg	no flags	Details Formatted Diff Diff
profiling data (152.00 KB, application/octet-stream) 2010-02-12 01:08 PST, Zoltan Herczeg	no flags	Details
profiling data (maxFunctionDepth = 32) (151.04 KB, application/octet-stream) 2010-02-12 05:02 PST, Zoltan Herczeg	no flags	Details
Version 4.1 (199.41 KB, patch) 2010-03-23 03:43 PDT, Zoltan Herczeg	no flags	Details Formatted Diff Diff
dump AST current version (55.76 KB, patch) 2010-05-20 23:34 PDT, Zoltan Herczeg	no flags	Details Formatted Diff Diff
latest version of my recursive descent parser (204.85 KB, patch) 2010-05-21 06:20 PDT, Zoltan Herczeg	no flags	Details Formatted Diff Diff
Recursive descent parser (114.85 KB, patch) 2010-05-24 02:11 PDT, Oliver Hunt	no flags	Details Formatted Diff Diff
latest patch (199.74 KB, patch) 2010-05-24 02:24 PDT, Zoltan Herczeg	no flags	Details Formatted Diff Diff
More complete parser (117.89 KB, patch) 2010-05-28 23:25 PDT, Oliver Hunt	no flags	Details Formatted Diff Diff
Updated parser (116.93 KB, patch) 2010-06-06 13:27 PDT, Oliver Hunt	no flags	Details Formatted Diff Diff
faster parser (198.59 KB, patch) 2010-06-17 05:00 PDT, Zoltan Herczeg	no flags	Details Formatted Diff Diff
Patch (132.06 KB, patch) 2010-06-23 18:17 PDT, Oliver Hunt	no flags	Details Formatted Diff Diff
Patch (130.70 KB, patch) 2010-06-23 19:11 PDT, Oliver Hunt	barraclough: review+	Details Formatted Diff Diff
Show Obsolete (23) View All Add attachment proposed patch, testcase, etc.

abarth
abecsi
akiss
ap
charlie137
cmarcelo
darin
ddkilzer
eric
ggaren
hausmann
jesus
joepeck
joybro201
kent.hansen
laszlo.gombos
loki
oliver
ossy
skyul
thomas
webkit.review.bot
xan.lopez
zoltan
zwarich