RESOLVED FIXED 34019
Custom-written JavaScript parser
https://bugs.webkit.org/show_bug.cgi?id=34019
Summary Custom-written JavaScript parser
Zoltan Herczeg
Reported 2010-01-22 13:41:11 PST
A bug entry for a custom written JavaScript parser. FAQ: - Why? I felt I could write a faster, specialized parser than bison - When the work has started? Start of January 2010 (3 weeks ago) - Main components? - grammar : parsing and generating nodes - syntax checker : just parsing (used for checking function bodies to find the terminator '}') - Can grammar and syntax checker be merged? It would help maintaining the code. Possible. I was thinking about that when actually started to separate the grammar from the syntax checker. However, there are some key differences between them (like syntax checker should check nested function bodies). And syntax checker has a much simpler (and faster) code. When you open a simple page (like www.google.com), you also get a huge JS with many functions. Most of them are never executed (browser specific for a different browser), but parsed (syntax checked). I would not like to give up the speed advantage, and the code of the grammar is ugly anyway :( - Advantages compared to bison? Speed. According to my measurement the grammar is 3 and the syntax checker is 4 times faster than bison. - Disadvantages? Readability. Although the code in grammar.y is ugly, it is repetative, and easy to understand. - Focus of the work? Compatibility with the existing bison parser (and the standard as well). Especially to detect parser errors in the same way. - But you skip possible optimizations in this way... I know. But it is only a first step on a long road. When this parser is tested and goes mainline, we could think about improving it. Actually there is a lot of room for such improvements. But not for now. - Testing? See dump-AST.patch . You can dump the AST, and compare the old and new ones. Now 127 fail on javascriptcore tests. Missing features: object literal, switch, arguments feature (no idea) - Supported build systems? Qt now. But easy for others, since you only have to add JavaScriptCore/parser/Grammar.cpp, .h, SyntaxChecker.cpp, .h, GrammarConsts.h - Any help? See the comments in GrammarConsts.h - Anything else? See LEGACY comments in the code. They could be improved soon, but break the AST compatibility. Anyway, we should do something with the divot points, since every node use them in a different way. For now, the compatibility is the first priority, but later we should define general rules for them, and all node should follow these rules.
Attachments
Dump AST (54.65 KB, patch)
2010-01-22 13:41 PST, Zoltan Herczeg
no flags
A new JavaScript Grammar (draft) (286.47 KB, patch)
2010-01-22 13:42 PST, Zoltan Herczeg
no flags
Improved Dump AST (55.76 KB, patch)
2010-01-25 06:33 PST, Zoltan Herczeg
no flags
A new JavaScript Grammar, which passes all jscore regression tests. (300.35 KB, patch)
2010-01-25 06:38 PST, Zoltan Herczeg
no flags
Selectable JavaScript grammar (178.63 KB, patch)
2010-01-26 03:19 PST, Zoltan Herczeg
no flags
Selectable JavaScript grammar (+mac build support) (184.80 KB, patch)
2010-01-26 06:40 PST, Zoltan Herczeg
oliver: review-
Next attempt (191.46 KB, patch)
2010-01-27 06:12 PST, Zoltan Herczeg
oliver: review-
Layout test for parser syntax checking (36.03 KB, patch)
2010-01-28 06:29 PST, Zoltan Herczeg
no flags
Another attempt (192.62 KB, patch)
2010-01-29 06:14 PST, Zoltan Herczeg
oliver: review-
Layout test for parser syntax checking (patch 2) (37.82 KB, patch)
2010-01-29 06:15 PST, Zoltan Herczeg
no flags
Next draft (211.49 KB, patch)
2010-02-03 06:35 PST, Zoltan Herczeg
oliver: review-
Merging syntax checker and grammar (177.49 KB, patch)
2010-02-05 05:20 PST, Zoltan Herczeg
no flags
template based approach (204.85 KB, patch)
2010-02-09 06:40 PST, Zoltan Herczeg
no flags
jquery-yacc.zip (6.74 MB, application/zip)
2010-02-09 15:16 PST, Geoffrey Garen
no flags
Updated patch (199.42 KB, patch)
2010-02-12 01:05 PST, Zoltan Herczeg
no flags
profiling data (152.00 KB, application/octet-stream)
2010-02-12 01:08 PST, Zoltan Herczeg
no flags
profiling data (maxFunctionDepth = 32) (151.04 KB, application/octet-stream)
2010-02-12 05:02 PST, Zoltan Herczeg
no flags
Version 4.1 (199.41 KB, patch)
2010-03-23 03:43 PDT, Zoltan Herczeg
no flags
dump AST current version (55.76 KB, patch)
2010-05-20 23:34 PDT, Zoltan Herczeg
no flags
latest version of my recursive descent parser (204.85 KB, patch)
2010-05-21 06:20 PDT, Zoltan Herczeg
no flags
Recursive descent parser (114.85 KB, patch)
2010-05-24 02:11 PDT, Oliver Hunt
no flags
latest patch (199.74 KB, patch)
2010-05-24 02:24 PDT, Zoltan Herczeg
no flags
More complete parser (117.89 KB, patch)
2010-05-28 23:25 PDT, Oliver Hunt
no flags
Updated parser (116.93 KB, patch)
2010-06-06 13:27 PDT, Oliver Hunt
no flags
faster parser (198.59 KB, patch)
2010-06-17 05:00 PDT, Zoltan Herczeg
no flags
Patch (132.06 KB, patch)
2010-06-23 18:17 PDT, Oliver Hunt
no flags
Patch (130.70 KB, patch)
2010-06-23 19:11 PDT, Oliver Hunt
barraclough: review+
Zoltan Herczeg
Comment 1 2010-01-22 13:41:45 PST
Created attachment 47222 [details] Dump AST
Zoltan Herczeg
Comment 2 2010-01-22 13:42:40 PST
Created attachment 47223 [details] A new JavaScript Grammar (draft)
Oliver Hunt
Comment 3 2010-01-22 14:35:50 PST
I'm unsure why the syntax checker is separate from the parser. Syntax checking is basically answering the question "can i parse this". There are semantic checks that we will want to do while parsing: the various "features" we track variable and function accumulation partial AST generation New things we _must_ be able to do in the new parser: strict mode identification needs to be done during the parse stage and alters the state of the parser to reject constructs that lax mode would allow, and produces parse time errors instead). For efficiency reasons this basically forces us into a recursive-descent parser :-/
Zoltan Herczeg
Comment 4 2010-01-22 14:57:39 PST
(In reply to comment #3) > I'm unsure why the syntax checker is separate from the parser. Syntax checking > is basically answering the question "can i parse this". The grammar do both syntax checking and generating nodes. Syntax checker only checks syntax, and used only for skipping nested function bodies. That is what I tried to explain. They have lot in common, but they have differences as well. > There are semantic checks that we will want to do while parsing: > the various "features" we track > variable and function accumulation > partial AST generation That is what my grammar do :) It is "binary compatible" with the bison parser. The only thing which is not tracked is the "arguments feature" because I don't really understand its purpose yet :( > New things we _must_ be able to do in the new parser: > strict mode identification needs to be done during the parse stage and > alters the state of the parser to reject constructs that lax mode would allow, > and produces parse time errors instead). For efficiency reasons this basically > forces us into a recursive-descent parser :-/ I saw the new ECMA standard, and I think I can hanle the strict mode restrictions, at least those which I saw (local identifiers cannot be used before a var declaration seems a more complex one, since you have to store all the primary identifiers during parsing). I checked wikipedia, and turned out my parser is a predictive parser (uses a stack, but never backtracks). The predictive parser belongs to the family of recursive descent parsers.
Oliver Hunt
Comment 5 2010-01-22 15:05:31 PST
(In reply to comment #4) > (In reply to comment #3) > > I'm unsure why the syntax checker is separate from the parser. Syntax checking > > is basically answering the question "can i parse this". > > The grammar do both syntax checking and generating nodes. Syntax checker only > checks syntax, and used only for skipping nested function bodies. That is what > I tried to explain. They have lot in common, but they have differences as well. Basically the parser should be set up with a flag along the lines of parse(int maxFunctionASTDepth) { ... } then the various functions to create nodes should basically be if (currentFunctionDepth < maxFunctionDepth) newNode = new ...; Having the ability to dynamically vary how deep we generate the tree will be beneficial for certain real world cases. > > > There are semantic checks that we will want to do while parsing: > > the various "features" we track > > variable and function accumulation > > partial AST generation > > That is what my grammar do :) It is "binary compatible" with the bison parser. > The only thing which is not tracked is the "arguments feature" because I don't > really understand its purpose yet :( Whether a function ever references a variable with the name "arguments" -- we need to know whether the arguments object may have been used. > > > New things we _must_ be able to do in the new parser: > > strict mode identification needs to be done during the parse stage and > > alters the state of the parser to reject constructs that lax mode would allow, > > and produces parse time errors instead). For efficiency reasons this basically > > forces us into a recursive-descent parser :-/ > > I saw the new ECMA standard, and I think I can hanle the strict mode > restrictions, at least those which I saw (local identifiers cannot be used > before a var declaration seems a more complex one, since you have to store all > the primary identifiers during parsing). I checked wikipedia, and turned out my > parser is a predictive parser (uses a stack, but never backtracks). The > predictive parser belongs to the family of recursive descent parsers. Everything should be trivial i think, the parser is gnarly enough to make me sad though.
Zoltan Herczeg
Comment 6 2010-01-22 15:20:03 PST
> Basically the parser should be set up with a flag along the lines of > parse(int maxFunctionASTDepth) { > ... > } > > then the various functions to create nodes should basically be > if (currentFunctionDepth < maxFunctionDepth) > newNode = new ...; > > Having the ability to dynamically vary how deep we generate the tree will be > beneficial for certain real world cases. No problem. I could extend my grammar to go generate nodes for nested functions as well (by default, the function depth counter will be 1, but you can increase it). Ok, just tell me your requirements, I hope my parser-beast can handle anything :)
Oliver Hunt
Comment 7 2010-01-22 15:23:24 PST
(In reply to comment #6) > > Basically the parser should be set up with a flag along the lines of > > parse(int maxFunctionASTDepth) { > > ... > > } > > > > then the various functions to create nodes should basically be > > if (currentFunctionDepth < maxFunctionDepth) > > newNode = new ...; > > > > Having the ability to dynamically vary how deep we generate the tree will be > > beneficial for certain real world cases. > > No problem. I could extend my grammar to go generate nodes for nested functions > as well (by default, the function depth counter will be 1, but you can increase > it). > > Ok, just tell me your requirements, I hope my parser-beast can handle anything > :) Any idea how easy/hard your model would make free variable analysis?
Zoltan Herczeg
Comment 8 2010-01-22 22:12:09 PST
> Any idea how easy/hard your model would make free variable analysis? Agree, it would be good to put some effort analysis here. Features, the new parser should do: - generating nodes for nested functions effort: easy. Actually this feature was in the code, but I removed it the gain some speed. - variable analysis effort: easy. We need some kind of 'set' where we can put the variables (I assume we talk about primary literals here). Because the tokens returned by the lexer analysed in order, we could easly check any dependency requirements between primary literals. - some other strict mode requirements: - with is not supported effort: easy - eval is a reserved word effort: easy - arguments is a reserved word nooo arguments again... anyway, probably easy
Zoltan Herczeg
Comment 9 2010-01-25 06:33:10 PST
Created attachment 47344 [details] Improved Dump AST
Zoltan Herczeg
Comment 10 2010-01-25 06:38:48 PST
Created attachment 47345 [details] A new JavaScript Grammar, which passes all jscore regression tests. Compared to the bison parser on SunSpider parser tests, namely jquery, mootools, prototype. The AST is the same. There are some divot differences because regular expression literals are not correctly handled by the bison parser. Line infos for non-statements are useless for bison parser.
Zoltan Herczeg
Comment 11 2010-01-25 06:44:34 PST
mootools parser test shown the following interesting behaviour of the bison parser on the following expression (simplified example): for (var i in x) for (var j in y) ; According to the bison parser, j is declared earlier than i. My predictive parser handles the declaration order correctly.
Zoltan Herczeg
Comment 12 2010-01-25 14:06:14 PST
I am thinking about the future of this work. I will do layout and performance testings tommorrow, but still meany other tasks ahead. Would be good to broke this patch to smaller pieces, but it doesn't seem possible, since everything is needed. I cannot cut out things, like 'switch', 'for' or 'function' parsing. Since this is a huge piece of code, probably many things will be needed to improve, like changing variable names, restructure code. This will need a lot of reviewer effort, which I thank for in advance. I planned this parser to be robust and future proof. It seems these advantages are put in good use sooner than I expected. Such things are strict JS parsing or generating AST for nested function bodies. However, it would be better to do them in separate patches, than doing everything in an all in-one package. Probably would be better to start these works after I got some feedback about the code. I have one question about function nesting. We use ParserArenaData<DeclarationStacks::...> variables in the code, but the FunctionBodyNode expects DeclarationStacks::... variables. What would be a good solution for this difference? Anyway, although the code is big, the concept behind it is very simple. A step-by-step debugging on simple JS sources can reaveal easly how it works.
Zoltan Herczeg
Comment 13 2010-01-26 03:19:08 PST
Created attachment 47396 [details] Selectable JavaScript grammar The patch is reworked. Now, it does not remove the original bison parser. The new parser is compiler selectable (ENABLE_PREDICTIVE_PARSER). It is disabled by default.
Zoltan Herczeg
Comment 14 2010-01-26 03:22:29 PST
New parser runtime results --suite parse-only: TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: 2.19x as fast 69.5ms +/- 1.1% 31.8ms +/- 0.9% significant ============================================================================= jquery: 2.20x as fast 10.8ms +/- 2.8% 4.9ms +/- 4.6% significant 1.3.2: 2.20x as fast 10.8ms +/- 2.8% 4.9ms +/- 4.6% significant mootools: 2.36x as fast 11.8ms +/- 2.6% 5.0ms +/- 0.0% significant 1.2.2-core-nc: 2.36x as fast 11.8ms +/- 2.6% 5.0ms +/- 0.0% significant prototype: 2.12x as fast 12.5ms +/- 3.0% 5.9ms +/- 3.8% significant 1.6.0.3: 2.12x as fast 12.5ms +/- 3.0% 5.9ms +/- 3.8% significant concat: 2.15x as fast 34.4ms +/- 1.1% 16.0ms +/- 0.0% significant jquery-mootools-prototype: 2.15x as fast 34.4ms +/- 1.1% 16.0ms +/- 0.0% significant No observable differences on other benchmarks (SunSpider, V8, WindScorpion)
WebKit Review Bot
Comment 15 2010-01-26 03:27:14 PST
Attachment 47396 [details] did not pass style-queue: Failed to run "WebKitTools/Scripts/check-webkit-style" exit_code: 1 JavaScriptCore/parser/PredictiveGrammar.h:39: Code inside a namespace should not be indented. [whitespace/indent] [4] JavaScriptCore/parser/PredictiveGrammarConsts.h:90: Code inside a namespace should not be indented. [whitespace/indent] [4] JavaScriptCore/parser/SyntaxChecker.h:36: Alphabetical sorting problem. [build/include_order] [4] JavaScriptCore/parser/SyntaxChecker.h:46: Code inside a namespace should not be indented. [whitespace/indent] [4] JavaScriptCore/parser/Lexer.h:36: Code inside a namespace should not be indented. [whitespace/indent] [4] JavaScriptCore/parser/Lexer.h:103: This { should be at the end of the previous line [whitespace/braces] [4] JavaScriptCore/parser/Lexer.h:109: This { should be at the end of the previous line [whitespace/braces] [4] Total errors found: 7 If any of these errors are false positives, please file a bug against check-webkit-style.
Zoltan Herczeg
Comment 16 2010-01-26 06:40:30 PST
Created attachment 47407 [details] Selectable JavaScript grammar (+mac build support)
WebKit Review Bot
Comment 17 2010-01-26 06:47:17 PST
Attachment 47407 [details] did not pass style-queue: Failed to run "WebKitTools/Scripts/check-webkit-style" exit_code: 1 JavaScriptCore/parser/PredictiveGrammar.h:39: Code inside a namespace should not be indented. [whitespace/indent] [4] JavaScriptCore/parser/PredictiveGrammarConsts.h:90: Code inside a namespace should not be indented. [whitespace/indent] [4] JavaScriptCore/parser/SyntaxChecker.h:46: Code inside a namespace should not be indented. [whitespace/indent] [4] JavaScriptCore/parser/Lexer.h:36: Code inside a namespace should not be indented. [whitespace/indent] [4] Total errors found: 4 If any of these errors are false positives, please file a bug against check-webkit-style.
Oliver Hunt
Comment 18 2010-01-26 14:35:51 PST
Comment on attachment 47407 [details] Selectable JavaScript grammar (+mac build support) I don't think there's any reason to still use cumulative "feature" flags. What you should do is have a function stack (very much pseudo code): struct FunctionInformation { Features features; Vector<VariableDeclaration> variables; ... etc ... } Vector<FunctionInformation> functions; Then when the parser starts a new function it should simply do: functions.append(FunctionInformation); And all the various constructs that add features become: functions.last().features |= NewFeatures; case VarDeclaration: ... functions.last().variables.add(theNewVariableDeclaration); ... etc (so we lose all the various array and feature unification rules) The only reason we use cumulative feature flags currently is because yacc/bison generate bottom up parsers which by definition can't do this (the parser doesn't "know" it's producing a function until it has already parsed the child statemetns) In terms of style rather than the current top(-n) = new Node(top(), top(-1), ..., top(-n)) i would prefer push(new Node(pop(), pop(), ...) Which i think is much nicer.
Zoltan Herczeg
Comment 19 2010-01-26 15:21:54 PST
(In reply to comment #18) > (From update of attachment 47407 [details]) > I don't think there's any reason to still use cumulative "feature" flags. I would prefer a scope flag better myself, but many nodes actually uses these features as arguments, like BinaryOpNode (and its derived classes) has rightHasAssignments (example: a + (b * c = d) -> AddNode rightHasAssignments must be true), or TryNode catchHasEval, BracketAccessorNode (and similar nodes) subscriptHasAssignments, etc. It would be really hard to decide these flags without local features. Actually I don't know how these flags modify the behaviour of emitByteCode, but if they are unnecessary, I could remove all of them :) > What you should do is have a function stack (very much pseudo code): > struct FunctionInformation { > Features features; > Vector<VariableDeclaration> variables; > ... etc ... > } > Vector<FunctionInformation> functions; I have exactly the same idea about function stack, except a minor tweak. The topmost FunctionInformation is stored by the Grammar itself, and pushed to the stack when a nested function is parsed. It is a little faster than an array access. > case VarDeclaration: > ... > functions.last().variables.add(theNewVariableDeclaration); > ... > > etc > > (so we lose all the various array and feature unification rules) > > In terms of style rather than the current top(-n) = new Node(top(), top(-1), > ..., top(-n)) i would prefer > push(new Node(pop(), pop(), ...) > > Which i think is much nicer. Wow, you are right. I didn't think about it before. Actually this is possible, since the items are not freed on the top of the stack, and the values can be accessed before a push reuse the entry. I hope the compiler won't mess up multiple pop()-s.
Guillaume Chereau
Comment 20 2010-01-26 23:39:43 PST
Hello all, I tried the new parser on the javascript parsing test from Holger's benchmarks (http://gitorious.org/qtwebkit/performance), and I get the following results on my laptop : bison parser : 192 ms new parser : 115 ms (it is ~ 1.6 time better) The test consist of parsing a set of javascript codes taken from a list of popular websites. However I couldn't test the whole set because of an error when parsing lines of this form : "new f(x++)" I get : ASSERTION FAILED: m_unaryMode == PrimaryMode || m_unaryMode == PrimaryDotMode || m_unaryMode == PrimaryArgumentMode (../../../JavaScriptCore/parser/PredictiveGrammar.cpp:273 void JSC::Grammar::primaryExpressionToNode()) Segmentation fault
Zoltan Herczeg
Comment 21 2010-01-27 06:12:29 PST
Created attachment 47525 [details] Next attempt
WebKit Review Bot
Comment 22 2010-01-27 06:13:49 PST
Attachment 47525 [details] did not pass style-queue: Failed to run "WebKitTools/Scripts/check-webkit-style" exit_code: 1 JavaScriptCore/parser/PredictiveGrammar.h:39: Code inside a namespace should not be indented. [whitespace/indent] [4] JavaScriptCore/parser/PredictiveGrammarConsts.h:90: Code inside a namespace should not be indented. [whitespace/indent] [4] JavaScriptCore/parser/SyntaxChecker.h:46: Code inside a namespace should not be indented. [whitespace/indent] [4] JavaScriptCore/parser/Lexer.h:36: Code inside a namespace should not be indented. [whitespace/indent] [4] JavaScriptCore/parser/PredictiveGrammar.cpp:2699: Should have a space between // and comment [whitespace/comments] [4] Total errors found: 5 If any of these errors are false positives, please file a bug against check-webkit-style.
Zoltan Herczeg
Comment 23 2010-01-27 06:27:34 PST
Thank you for the feedback! > (it is ~ 1.6 time better) great! I hope that was in release mode :) > "new f(x++)" Thanks. I also found some bugs myself, and the fast/js layout tests also shown me some others, like: (1++).x; a(x++).b; for (new a in c) { }, and "a + b = c" should fail on the syntax checker as well. All of them are fixed in the newest patch. The constructor of the parser extended to: Grammar(JSGlobalData* globalData, int maxFunctionDepth = 1, bool strict = false) I started to work on maxFunctionDepth first, and it is supported now. Tested both on small examples and all JSCore tests are passed with maxFunctionDepth = 3 . The 'strict' is just added for future use. I really want to implement the strict mode checks in a different patch :) There are only two fast/js tests which fail: fast/js/reparsing-semicolon-insertion.html - the comment says: // According to the ECMA spec, these should all be syntax errors. However, the // pre-existing behaviour of JavaScriptCore has always been to accept them. fast/js/function-declaration-statement.html - crashes. No idea what is happening here (yet), since the JS code snippets are parsed correctly by the command line tool.
Oliver Hunt
Comment 24 2010-01-27 15:54:36 PST
Comment on attachment 47525 [details] Next attempt I don't like the case '-': .... case '{': ... etc idiom. You should stick to the symbolic names like OPENBRACE -- this will also reduce the lexer diffs, and reduce the risk of accidentally introducing a symbolic name that overloads a character. We also need to get the full set of tests to parse correctly. Then we need to fuzz this (we don't have many -- if any -- invalid script tests alas) --Oliver
Zoltan Herczeg
Comment 25 2010-01-28 06:29:29 PST
Created attachment 47616 [details] Layout test for parser syntax checking
Zoltan Herczeg
Comment 26 2010-01-28 06:41:33 PST
(In reply to comment #25) > Created an attachment (id=47616) [details] > Layout test for parser syntax checking This test contains syntax checking examples. Both parsers can handle these test cases. However, I noticed something for the bison parser: function f() { return 6 + } is accepted, altough return 6 + is not.
Zoltan Herczeg
Comment 27 2010-01-29 06:14:04 PST
Created attachment 47711 [details] Another attempt
Zoltan Herczeg
Comment 28 2010-01-29 06:15:11 PST
Created attachment 47712 [details] Layout test for parser syntax checking (patch 2) Adding more tests.
WebKit Review Bot
Comment 29 2010-01-29 06:17:37 PST
Attachment 47711 [details] did not pass style-queue: Failed to run "WebKitTools/Scripts/check-webkit-style" exit_code: 1 JavaScriptCore/parser/PredictiveGrammar.h:39: Code inside a namespace should not be indented. [whitespace/indent] [4] JavaScriptCore/parser/PredictiveGrammarConsts.h:90: Code inside a namespace should not be indented. [whitespace/indent] [4] JavaScriptCore/parser/SyntaxChecker.h:46: Code inside a namespace should not be indented. [whitespace/indent] [4] JavaScriptCore/parser/Lexer.h:36: Code inside a namespace should not be indented. [whitespace/indent] [4] Total errors found: 4 If any of these errors are false positives, please file a bug against check-webkit-style.
Zoltan Herczeg
Comment 30 2010-01-29 06:28:48 PST
(In reply to comment #24) > (From update of attachment 47525 [details]) > I don't like the > case '-': .... > case '{': ... > etc Done. I also found another two errors in my parser: "for (a,b in c ; ;) {}" accepted, and "if (a) function f() {}" caused segfault. Both of them fixed (these were minor bugs, only 1 line change was required). Now all jscore and layout tests are passed successfully. Finally :) > We also need to get the full set of tests to parse correctly. Then we need to > fuzz this (we don't have many -- if any -- invalid script tests alas) I have attached a new patch here. It contains about 250 strings, half of them valid JavaScript source code, and the other half is not. The strings were tested as plain stings, and a "function f() {}" function declaration is added around them. This is useful, since for example "return 6+" fails with syntax error in the bison test, while "function f() { return 6+ }" is not. I hope the patch is not far from final. Hard to update this amount of code again and again :) When it reaches nearly finished state, I will add a ChangeLog entry and update the mac project rules.
Zoltan Herczeg
Comment 31 2010-01-29 06:51:54 PST
TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: 2.16x as fast 69.5ms +/- 1.3% 32.2ms +/- 0.9% significant ============================================================================= jquery: 2.14x as fast 10.7ms +/- 3.2% 5.0ms +/- 0.0% significant 1.3.2: 2.14x as fast 10.7ms +/- 3.2% 5.0ms +/- 0.0% significant mootools: 2.26x as fast 11.3ms +/- 3.1% 5.0ms +/- 0.0% significant 1.2.2-core-nc: 2.26x as fast 11.3ms +/- 3.1% 5.0ms +/- 0.0% significant prototype: 2.08x as fast 12.7ms +/- 2.7% 6.1ms +/- 3.7% significant 1.6.0.3: 2.08x as fast 12.7ms +/- 2.7% 6.1ms +/- 3.7% significant concat: 2.16x as fast 34.8ms +/- 0.9% 16.1ms +/- 1.4% significant jquery-mootools-prototype: 2.16x as fast 34.8ms +/- 0.9% 16.1ms +/- 1.4% significant Since the code is near final, I have measured the gain again. No surprise here. We have tried it on real web-pages, but unfortunately we experinced about 100% deviation, so the comparison is useless. (I got a message now, that the regressions are passed on mac as well! Excellent news for me) charlie137, would you repeat your measurements again?
Guillaume Chereau
Comment 32 2010-01-29 10:05:31 PST
My previous test was in debug mode, that is why the difference was not as big between the old an new parser. Here are the results (in ms) using the full data set, in debug and release mode for the old and new parser : | mode | old | new | |---------+-----+-----| | debug | 836 | 689 | | release | 204 | 75 | Strangely, with this dataset the difference is not that big in debug mode, but huge in release mode (compiled with -O3).
Oliver Hunt
Comment 33 2010-02-01 12:26:48 PST
Comment on attachment 47711 [details] Another attempt I really don't like the top(-n) stuff. I would much rather all were replaced with pop(), etc. I think if there isn't a performance hit it would be _much_ clearer to have all code be of the form GrammarStackItem rhs = pop(); GrammarStackItem lhs = pop(); push(new AddNode(lhs, rhs)); Or whatever, rather than top(-2) = new AddNode(top(-1), top(-2)); pop();
Zoltan Herczeg
Comment 34 2010-02-03 06:35:37 PST
Created attachment 48022 [details] Next draft Top-less version :D (+25% .cpp code size +0% functionality) JS core and layout tests passed Fortunately, the parsing speed is not affected (based on SS --parse-only tests) This was a more difficult work than I expected. Finally it is finished.
Zoltan Herczeg
Comment 35 2010-02-03 07:04:13 PST
Oliver, what is your opinion about the syntax checker patch? Tasks ahead: * I havent't broken the node generator mechanism, yet. Partial object generation would only be helpful for few, complex nodes, like try, switch, for (and they need to do some consistency check before emitByteCode). Unfortunately, we still have to store the parsing status, which means perhaps we can decrease the required number of stack items to 2 down from 3. * line info, and divot points. It is a mess now, every node do the things differently (depending on how the bison parsing them). We should introduce clear rules and follow them. We can do that now using our predictive parser. * generating parser rules. I have no idea how to do this, since we have so many different language constructions. I couldn't find any real node grouping rules. The two major groups are expressions and statements. Statements are more difficult to implement than expressions. Is there anything else?
Oliver Hunt
Comment 36 2010-02-03 16:17:10 PST
Comment on attachment 48022 [details] Next draft Okay, I'm going to say that the separate syntax checker should go -- there's too much risk of different behaviour between the syntax generator and real parser. If we want pure syntax checking to be faster we just need a call to the parser that says that it shouldn't build an ast at any level
Zoltan Herczeg
Comment 37 2010-02-05 05:20:12 PST
Created attachment 48223 [details] Merging syntax checker and grammar The merge is done, although I am not sure this is a step ahead, since it is slower. *1.028x as slow* 32.6ms +/- 1.1% 33.5ms +/- 2.1% significant I feel the code can be improved (and fast/js/function-dot-arguments.html fails), but first I would like to hear your opinion whether I should continue this work or step back to the previous patch. Besides, is the parser layout test OK?
Zoltan Herczeg
Comment 38 2010-02-05 05:52:04 PST
> *1.028x as slow* 32.6ms +/- 1.1% 33.5ms +/- 2.1% significant Just for the sake of clarity, it is compared to the previous version with a separated checker. Compared to the bison parser, it is still more than 2 times faster. 2.07x as fast 69.5ms +/- 1.3% 33.5ms +/- 2.1%
Darin Adler
Comment 39 2010-02-05 06:53:01 PST
A way to avoid having two copies of the code and yet still have the non-syntax-check version be fast, is to use C++ templates. You can have a boolean template argument decide whether functions are for syntax checking or for compiling, and then avoid adding any branches. I considered mentioning this right when Oliver did his review, but for some reason assumed that performance would not be a problem. If this technique works then we get the best of both worlds. Only one set of source code to maintain, but the full speed we'd otherwise get by having the syntax checker entirely separate.
Zoltan Herczeg
Comment 40 2010-02-05 08:15:31 PST
> A way to avoid having two copies of the code and yet still have the > non-syntax-check version be fast, is to use C++ templates. I like this idea. A difference between the syntax checker and he grammar is not only the simplified code, but also the simplified data structures. Furtunately, the data structures can also be template arguments. My question is which is the best way to do it? What about the following: template<bool syntaxChecker, class StackItem, class MemberVariables>::class Grammar {}; struct GrammarStackItem { ... }; struct SyntaxCheckerStackItem { ... }; struct SyntaxCheckerMemberVariables { ... }; struct GrammarMemberVariables { ... Grammar<true, SyntaxCheckerStackItem, SyntaxCheckerMemberVariables> checker; }; I hope the declaration order will be ok in this way.
Zoltan Herczeg
Comment 41 2010-02-05 21:48:57 PST
Hm, I am not sure all compiler will like to access non-existing members even if the a branch is known to be false. Instead, I could use ALWAYS_INLINE get/set methods, and put ASSERT_NOT_REACHED to those getters/setter which are not exist for the current structure.
Darin Adler
Comment 42 2010-02-07 13:07:12 PST
(In reply to comment #41) > Hm, I am not sure all compiler will like to access non-existing members even if > the a branch is known to be false. It won't. > Instead, I could use ALWAYS_INLINE get/set > methods, and put ASSERT_NOT_REACHED to those getters/setter which are not exist > for the current structure. That might work. Generally the technique is to have functions in a class passed to the template that perform all access to the members that are conditional and then empty functions with the same name in the other class passed to the template.
Oliver Hunt
Comment 43 2010-02-08 12:15:55 PST
Comment on attachment 47712 [details] Layout test for parser syntax checking (patch 2) r=me on the testcase -- there's no reason not to have the test in the repository
Eric Seidel (no email)
Comment 44 2010-02-08 15:18:08 PST
Attachment 47712 [details] was posted by a committer and has review+, assigning to Zoltan Herczeg for commit.
Zoltan Herczeg
Comment 45 2010-02-09 01:35:06 PST
Comment on attachment 47712 [details] Layout test for parser syntax checking (patch 2) Landed in r54530
Zoltan Herczeg
Comment 46 2010-02-09 06:40:22 PST
Created attachment 48409 [details] template based approach Finally finished. Actually, this was the hardest work so far. Darin's prediction was right, the speed of this approach is nearly as fast as the separate grammar syntax checker approach (32.6ms up from 32.2ms). Besides, the code is nearly as big :) Regardless, I think this is the best choice between speed and maintainability. I have a small design question. I know C++ inlines functions inside a class definition, but still, I like to put an ALWAYS_INLINE macro before the definition to stress that I want it to be an inline function. This information can be useful for those, who want to change the code. What is WebKit's policy here? (I couldn't find anything about it in the coding guidelines)
Alexey Proskuryakov
Comment 47 2010-02-09 08:17:29 PST
We usually add ALWAYS_INLINE when profiling suggests measurable benefit from it.
Darin Adler
Comment 48 2010-02-09 10:46:25 PST
Comment on attachment 48409 [details] template based approach The pbxproj part of this patch is failing to apply, so the trybots are unable to evaluate it.
Darin Adler
Comment 49 2010-02-09 10:47:33 PST
(In reply to comment #47) > We usually add ALWAYS_INLINE when profiling suggests measurable benefit from > it. Yes, that's the guideline. Don't use it unless it's needed to improve performance, and feel free to use it if it does improve performance. Same for NEVER_INLINE.
Zoltan Herczeg
Comment 50 2010-02-09 11:47:14 PST
(In reply to comment #48) > (From update of attachment 48409 [details]) > The pbxproj part of this patch is failing to apply, so the trybots are unable > to evaluate it. I haven't updated my project for more than a week, and I don't intend to do it until major changes required, since I cannot compare the performance gains or losses anymore. When you say the general direction is ok, and only smaller reworkings are required, I will update the project of course. Each of the last 3 patch changed at least 50% of the codebase, and I wanted to make sure I don't mess up something :) Thank you for your advices about ALWAYS_INLINE. I prefer all 1 line getter setters to be inline functions.
Darin Adler
Comment 51 2010-02-09 11:59:36 PST
(In reply to comment #50) > Thank you for your advices about ALWAYS_INLINE. I prefer all 1 line getter > setters to be inline functions. Yes. But please not that we only use ALWAYS_INLINE if there’s some reason they are not getting inlined. We don’t use that macro to express preferences.
Darin Adler
Comment 52 2010-02-09 12:01:18 PST
The major problem remaining here is that this new parser may be too difficult to modify when we make changes to the language or fix bugs. I believe Oliver is investigating generating this parser from some sort of grammar, which seems like a way to get a parser that works just like this, but remain flexible.
Geoffrey Garen
Comment 53 2010-02-09 14:52:26 PST
These comments probably would have been more useful earlier in your development process. Sorry about that. I wish we knew exactly what was slow about our YACC parser. I hope we're not just falling prey to Second System Syndrome here, and throwing out working code that could be optimized. Do you have any data on where the YACC parser spends its time? If I had to guess, I would guess that the main optimization in this rewrite, as measured by SunSpider, was to do less work in the syntax checking case. But there seems to be low-hanging fruit in the existing YACC grammar for doing those same optimizations. Did you consider that option? If so, are there significant barriers to it? I found the code in this patch a bit hard to follow, and I'm worried that it might be difficult to modify this parser in the future. Here are a few things that could make this patch clearer: * I don't think there's any need for the ENABLE(PREDICTIVE_PARSER) flag. Having two parsers just makes the code harder to follow. I think this patch can just remove the old parser -- that's its intent, after all. * Why does the parser keep its own m_stackPointer instead of just using Vector::size and Vector::last? * It's a surprising idiom to grow the stack, and then modify the top-most item. I would recommend creating a stack entry and pushing onto the top of the stack instead. * The two different copies of the *StackItem and * LocalMembers classes make the structure of the stack hard to follow. Did you provide two copies of these classes just to support ASSERT_NOT_REACHED calls? If so, is there another way to accomplish what those assertions try to accomplish?
Geoffrey Garen
Comment 54 2010-02-09 15:15:57 PST
I'm attaching an example of the kind of analysis you could do of YACC (jquery-yacc.zip). I set YYDEBUG to 1 and ran the jquery parsing test. It looks like the most common state the parser was in was 607, which is a choice between shifting '.', '(', or '[', and reducing MemberExpr_NoNode or NewExpr_NoNode.
Geoffrey Garen
Comment 55 2010-02-09 15:16:52 PST
Created attachment 48444 [details] jquery-yacc.zip
Zoltan Herczeg
Comment 56 2010-02-10 00:57:23 PST
Thank you for your comments > I wish we knew exactly what was slow about our YACC parser. I hope we're not > just falling prey to Second System Syndrome here, and throwing out working code > that could be optimized. Do you have any data on where the YACC parser spends > its time? This is only half of the story. The real issue here is not the present, but the future of JavaScript. The strict mode defined by ECMA requires an LL parser. This example was mentioned before: for (var a in b) for (var c in a) { } According to the bison parser, 'c' is defined first, and 'a' later, which is false. This is a nature of a bottom-up parser (right recursive LALR grammar), and probably cannot be helped. In strict mode, the "var id" definition must precede the first use of "id". The above statement is valid in strict mode, since "var a" is defined before it is used. However, this check cannot be done during the parsing in a bison parser. Post parsing checks takes extra time, and it would be really hard for nested function bodies, since no AST is generated for them. That means we need an LL (left recursive) grammar to do these strict mode checks in parsing time. Unfortunately, JavaScript is not left recursive which makes things difficult. Oliver tries to find a solution, which would be basically an automatically generated LL parser, which can enter to right recursive mode, when it is necessary. The task is really challenging, but if it was successful (the resulting parser is much faster than bison) it would be worth some nice research papers on some big conferences (or even more). Anyway, your idea is good, profiling can help for further parser optimizations. > If I had to guess, I would guess that the main optimization in this rewrite, as > measured by SunSpider, was to do less work in the syntax checking case. But > there seems to be low-hanging fruit in the existing YACC grammar for doing > those same optimizations. Did you consider that option? If so, are there > significant barriers to it? Actually, no. The node generator is 3 times faster (in pure speed, excluding everything else, including the lexer). The syntax checking is a little better, it is 4 times faster. Besides, do you think it would be worth to cut the bison grammar into two files, one for NoNode rules, and one for everything else? > I found the code in this patch a bit hard to follow, and I'm worried that it > might be difficult to modify this parser in the future. Here are a few things > that could make this patch clearer: > > * I don't think there's any need for the ENABLE(PREDICTIVE_PARSER) flag. Having > two parsers just makes the code harder to follow. I think this patch can just > remove the old parser -- that's its intent, after all. This is the long term plan, but we should give some time for all ports to prepare for the change. When it is enabled on all ports, we can simply remove the grammar.y In my first patch, I removed the grammar.y, but others conviced me not to do it now. > * Why does the parser keep its own m_stackPointer instead of just using > Vector::size and Vector::last? Because the parser stack is ever growing during parsing. Actually the members of a stack item must be accessible after it is popped from the stack. An automated shrink could cause a segmentation fault. > * It's a surprising idiom to grow the stack, and then modify the top-most item. > I would recommend creating a stack entry and pushing onto the top of the stack > instead. The stack item is really complex, and usually only some members are actually used. We can gain some speed here by not copying unnecessary blocks. Anyway, I think I could change this behaviour if you wish, I don't think the gain would be that significant. > * The two different copies of the *StackItem and * LocalMembers classes make > the structure of the stack hard to follow. Did you provide two copies of these > classes just to support ASSERT_NOT_REACHED calls? If so, is there another way > to accomplish what those assertions try to accomplish? StackItem can be SyntaxCheckerStackItem or GrammarStackItem, and LocalMembers can be SyntaxCheckerLocalMembers or GrammarLocalMembers. Naturally, at least one (sometimes both) of these implementations do not contain an ASSERT_NOT_REACHED for a getter or setter. We need these inline function calls to support templates.
Zoltan Herczeg
Comment 57 2010-02-12 01:05:51 PST
Created attachment 48631 [details] Updated patch Updated patch. Actually ALWAYS_INLINE really improves the performance for getter/setters, so I decided to keep them.
Zoltan Herczeg
Comment 58 2010-02-12 01:08:45 PST
Created attachment 48632 [details] profiling data profiling - grpof "cat concat-jquery-mootools-prototype.js" 16 times coverage - gcov on concat-jquery-mootools-prototype.js interesting. really
Zoltan Herczeg
Comment 59 2010-02-12 05:02:07 PST
Created attachment 48638 [details] profiling data (maxFunctionDepth = 32) The previous profile data was mainly syntax checking, since the nested functions are never called (nodes are not generated for them). However, our shiny custom parser supports deeper AST generation. I set maxFunctionDepth to 32, which depth is probably enough for the parsing tests, and repeated the measurements.
Eric Seidel (no email)
Comment 60 2010-02-17 15:29:17 PST
I'm confused. Is this bug intentionally marked as a security bug?
Oliver Hunt
Comment 61 2010-02-17 15:36:30 PST
(In reply to comment #60) > I'm confused. Is this bug intentionally marked as a security bug? I don't believe so, if i don't see anything to the contrary in the next few hours i'll drop the flag.
Zoltan Herczeg
Comment 62 2010-02-19 13:30:50 PST
I was informed this work has no future, so I think this bug should be closed. I plan to write a blog post about summarizing the results, and would be good to link the patches here. Could you drop the security flag? (Could someone give me the right to set the WONTFIX flag in the bugzilla?)
Eric Seidel (no email)
Comment 63 2010-02-19 13:34:16 PST
I've fixed your bug permissions. Dropping the security flag per Oliver's comment above.
Oliver Hunt
Comment 64 2010-02-19 13:38:14 PST
(In reply to comment #62) > I was informed this work has no future, so I think this bug should be closed. I > plan to write a blog post about summarizing the results, and would be good to > link the patches here. Could you drop the security flag? (Could someone give me > the right to set the WONTFIX flag in the bugzilla?) I'm not sure we want to close this bug -- maybe we want to retitle this bug, something like "make a decent parser in the model of zoltan's awesome one" ?
Darin Adler
Comment 65 2010-02-20 09:37:17 PST
(In reply to comment #62) > I was informed this work has no future I'm surprised to hear that! I thought there was a big feature for this. My hope is we can use this work to create a fast parser, but enhance it further, restoring the feature the bison one has, where you can change the syntax without having to be an expert on the parser internals.
Darin Adler
Comment 66 2010-02-20 09:37:31 PST
(In reply to comment #65) > I thought there was a big feature for this. future
Oliver Hunt
Comment 67 2010-02-20 18:34:53 PST
(In reply to comment #65) > (In reply to comment #62) > > I was informed this work has no future > > I'm surprised to hear that! > In its current form it's simply not maintainable -- either we need to somehow make it something that can be understood, or we cannot reasonably use it -- ES6 is likely to pick up new syntactic forms and it is hard to see how someone could add support for any new syntax in this grammar. The problem is that it is effectively a single large piece of spaghetti and nothing i can think of really resolves this issue. The alternatives (to me) seem to be either writing something that is actually a recursive descent parser, or automatically generating a parser of this form.
Geoffrey Garen
Comment 68 2010-02-22 12:01:47 PST
> The alternatives (to me) seem to be either writing > something that is actually a recursive descent parser, or automatically > generating a parser of this form. Agreed.
Geoffrey Garen
Comment 69 2010-02-22 12:03:59 PST
> This is only half of the story. The real issue here is not the present, but the > future of JavaScript. The strict mode defined by ECMA requires an LL parser. > This example was mentioned before: > > for (var a in b) for (var c in a) { } > > According to the bison parser, 'c' is defined first, and 'a' later, which is > false. This is a nature of a bottom-up parser (right recursive LALR grammar), > and probably cannot be helped. In strict mode, the "var id" definition must > precede the first use of "id". The above statement is valid in strict mode, > since "var a" is defined before it is used. However, this check cannot be done > during the parsing in a bison parser. Post parsing checks takes extra time, and > it would be really hard for nested function bodies, since no AST is generated > for them. Interesting. Thanks for pointing that out. I don't see an obvious good way to solve that problem with a bottom-up parser.
Zoltan Herczeg
Comment 70 2010-03-23 03:43:00 PDT
Created attachment 51410 [details] Version 4.1
Zoltan Herczeg
Comment 71 2010-05-20 23:34:51 PDT
Created attachment 56676 [details] dump AST current version Latest dump AST patch
Zoltan Herczeg
Comment 72 2010-05-21 06:20:54 PDT
Created attachment 56704 [details] latest version of my recursive descent parser Here comes an updated version of our recursive descent JS parser. It's twice as fast as the bison-generated one. Please, have a look. TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: 2.17x as fast 75.4ms +/- 0.9% 34.7ms +/- 1.4% significant ============================================================================= jquery: 2.29x as fast 11.9ms +/- 1.9% 5.2ms +/- 5.8% significant 1.3.2: 2.29x as fast 11.9ms +/- 1.9% 5.2ms +/- 5.8% significant mootools: 2.10x as fast 12.4ms +/- 3.0% 5.9ms +/- 3.8% significant 1.2.2-core-nc: 2.10x as fast 12.4ms +/- 3.0% 5.9ms +/- 3.8% significant prototype: 2.12x as fast 13.8ms +/- 2.2% 6.5ms +/- 5.8% significant 1.6.0.3: 2.12x as fast 13.8ms +/- 2.2% 6.5ms +/- 5.8% significant concat: 2.18x as fast 37.3ms +/- 0.9% 17.1ms +/- 1.3% significant jquery-mootools-prototype: 2.18x as fast 37.3ms +/- 0.9% 17.1ms +/- 1.3% significant (Oliver mentioned on IRC that there are plans for writing a recursive descent parser. Why to implement another one if here is an already working solution?)
Oliver Hunt
Comment 73 2010-05-21 08:05:46 PDT
(In reply to comment #72) > Created an attachment (id=56704) [details] > latest version of my recursive descent parser > > Here comes an updated version of our recursive descent JS parser. It's twice as fast as the bison-generated one. Please, have a look. > > TEST COMPARISON FROM TO DETAILS > > ============================================================================= > > ** TOTAL **: 2.17x as fast 75.4ms +/- 0.9% 34.7ms +/- 1.4% significant > > ============================================================================= > > jquery: 2.29x as fast 11.9ms +/- 1.9% 5.2ms +/- 5.8% significant > 1.3.2: 2.29x as fast 11.9ms +/- 1.9% 5.2ms +/- 5.8% significant > > mootools: 2.10x as fast 12.4ms +/- 3.0% 5.9ms +/- 3.8% significant > 1.2.2-core-nc: 2.10x as fast 12.4ms +/- 3.0% 5.9ms +/- 3.8% significant > > prototype: 2.12x as fast 13.8ms +/- 2.2% 6.5ms +/- 5.8% significant > 1.6.0.3: 2.12x as fast 13.8ms +/- 2.2% 6.5ms +/- 5.8% significant > > concat: 2.18x as fast 37.3ms +/- 0.9% 17.1ms +/- 1.3% significant > jquery-mootools-prototype: 2.18x as fast 37.3ms +/- 0.9% 17.1ms +/- 1.3% significant > > (Oliver mentioned on IRC that there are plans for writing a recursive descent parser. Why to implement another one if here is an already working solution?) I just find this parser very difficult to follow -- it's possible my parser is insufficiently complete to show the complexity present in this one (my parser is still very very early in its development) but I honestly have difficulty following the code here. Once i have more of the grammar hooked up in my parser (i have yet to get a complete parser for the expression tree written) i'll look at the complexity
Akos Kiss
Comment 74 2010-05-21 10:11:03 PDT
(In reply to comment #73) > > (Oliver mentioned on IRC that there are plans for writing a recursive descent parser. Why to implement another one if here is an already working solution?) > > I just find this parser very difficult to follow -- it's possible my parser is insufficiently complete to show the complexity present in this one (my parser is still very very early in its development) but I honestly have difficulty following the code here. Once i have more of the grammar hooked up in my parser (i have yet to get a complete parser for the expression tree written) i'll look at the complexity Ohh, is this serious? A whole parser will be reimplemented (how long will it take?), just because the current one (which is ready for use, and which is fast!) is complex? I have a deja vu... It seems to me that we played the same game before. We've been working on the ARM JIT and got our solutions rejected again and again, just to find out later on that someone at Apple is already working on something similar. Long live the open source mentality... :( P.S.: We are happy to maintain the code we write. We have been around for 2 years, but we are still strugling to get us accepted as serious contributors...
Oliver Hunt
Comment 75 2010-05-21 10:25:38 PDT
(In reply to comment #74) > (In reply to comment #73) > > > (Oliver mentioned on IRC that there are plans for writing a recursive descent parser. Why to implement another one if here is an already working solution?) > > > > I just find this parser very difficult to follow -- it's possible my parser is insufficiently complete to show the complexity present in this one (my parser is still very very early in its development) but I honestly have difficulty following the code here. Once i have more of the grammar hooked up in my parser (i have yet to get a complete parser for the expression tree written) i'll look at the complexity > > Ohh, is this serious? A whole parser will be reimplemented (how long will it take?), just because the current one (which is ready for use, and which is fast!) is complex? No idea, i just need to get enough implemented to get an idea as to whether the complexity of this parser is reasonable. We always attempt to avoid unnecessarily complex code, and this parser seems very strangely constructed. > > I have a deja vu... It seems to me that we played the same game before. We've been working on the ARM JIT and got our solutions rejected again and again, just to find out later on that someone at Apple is already working on something similar. In the case of ARM we were working on a thumb2 version and i believe the discussions between you guys and gavin led to what was a better design overall. Anyhoo, this situation is entirely different -- I started my recursive descent parser only recently, i've only spent around 3 hours on it so far so it can't even parse an entire expression tree yet. I also told zoltan the first time i saw him on irc again. > > Long live the open source mentality... :( > > P.S.: We are happy to maintain the code we write. We have been around for 2 years, but we are still strugling to get us accepted as serious contributors... We do consider you guys to be serious contributors, we frequently talk to zoltan and ossy on irc, however we really don't like big complex code chunks landing unless we're sure that the complexity is necessary -- this applies to everyone, including other people from Apple.
Akos Kiss
Comment 76 2010-05-22 02:11:31 PDT
(In reply to comment #75) > > I have a deja vu... It seems to me that we played the same game before. We've been working on the ARM JIT and got our solutions rejected again and again, just to find out later on that someone at Apple is already working on something similar. > > In the case of ARM we were working on a thumb2 version and i believe the discussions between you guys and gavin led to what was a better design overall. Let's say, it's a different design. (We are still investigating what we might have lost - in terms of performance - by conforming to the macro assembler.) > > P.S.: We are happy to maintain the code we write. We have been around for 2 years, but we are still strugling to get us accepted as serious contributors... > > We do consider you guys to be serious contributors, we frequently talk to zoltan and ossy on irc, however we really don't like big complex code chunks landing unless we're sure that the complexity is necessary -- this applies to everyone, including other people from Apple. Well, I've never seen before a feature implemented twice just to check it's complexity. (Although, I can imagine other reasons for the reimplementation...) Anyway, this is not the right forum to discuss this. And I don't want to start a flame war either. Perhaps we can have a talk about this next week in person...
Oliver Hunt
Comment 77 2010-05-22 03:07:26 PDT
(In reply to comment #76) > (In reply to comment #75) > > > I have a deja vu... It seems to me that we played the same game before. We've been working on the ARM JIT and got our solutions rejected again and again, just to find out later on that someone at Apple is already working on something similar. > > > > In the case of ARM we were working on a thumb2 version and i believe the discussions between you guys and gavin led to what was a better design overall. > > Let's say, it's a different design. (We are still investigating what we might have lost - in terms of performance - by conforming to the macro assembler.) When accounting for performance you need to consider all the optimisations we have made that would need to be duplicated if you weren't using the macro assembler. > > > > P.S.: We are happy to maintain the code we write. We have been around for 2 years, but we are still strugling to get us accepted as serious contributors... > > > > We do consider you guys to be serious contributors, we frequently talk to zoltan and ossy on irc, however we really don't like big complex code chunks landing unless we're sure that the complexity is necessary -- this applies to everyone, including other people from Apple. > > Well, I've never seen before a feature implemented twice just to check it's complexity. (Although, I can imagine other reasons for the reimplementation...) I feel that the implementation attached to this patch is unnecessarily complicated, and as a project we try not to take code that is more complex than necessary. The problem with complex patches is that they are difficult understand, and therefore difficult to improve. This may not seem like a problem, but it means for all we know there could be substantial performance improvements to be made that will be hidden behind the complexity. This is why we always strive for simple code, especially on the initial commit (alas complexity always creeps in over time, but starting off with "complex" just makes it worse). > > Anyway, this is not the right forum to discuss this. And I don't want to start a flame war either. Perhaps we can have a talk about this next week in person... In all honesty, the way to avoid a flame war is to come on irc and actually discuss issues as it allows comments to be elaborated upon and so reduces the likelihood of comments being misconstrued as insulting or passive aggressive, which in is all honesty what your initial comment came across as.
Zoltan Herczeg
Comment 78 2010-05-22 05:06:23 PDT
> I feel that the implementation attached to this patch is unnecessarily complicated, and as a project we try not to take code that is more complex than necessary. This is my fault. As you may guessed, this is not my first parser, and I just throw heavily optimizations which was proven useful before without thinking about others. Probably the same thing would happen, if I have to reimplement JIT for a different project, I would just use experinces gained here to make an effective implementation, and others would not understand since they don't know the steps I skipped. The parser is fast, has small memory footprint (including machine stack), but it is true, not easy to understand. A good documentation could be useful here (and not just here, newcomers always find difficult to understand WebKit) Anyway, the parser is not a time critical project, we should evaluate options before we decide. And share the experiences on irc :)
Zoltan Horvath
Comment 79 2010-05-22 09:14:37 PDT
Hey, this patch only adds code to WebKit - and what it adds is disabled by default - doesn't remove anything. For the present, it doesn't claim additional work from other contributors. I think it doesn't claim such a big thing to add it to the trunk. We can hack on the code complexity, but there are lots of codes in WebKit, which are understood only by a couple of people. I can’t see difference here. If somebody wants to improve this section, then he'll understand it sooner or later. With a good documentation in the trac, this can be easy enough. If it will be in the trunk, people can try it out easily and make their decision about using it. If we will see that the community doesn't like it, there is still a chance to remove it. I know that WebKit is not an experimental project and we don't want to see much experimental features in WebKit, but neither is this parser. I will be glad to see this in the trunk, because I don't see any counter-arguments. Zoltan
Oliver Hunt
Comment 80 2010-05-24 02:11:59 PDT
Created attachment 56863 [details] Recursive descent parser Here's my weekends work -- it's a simple recursive descent parser for JS. It is not complete -- it's missing const, debugger, accessors in object literals, and with. Additionally i haven't yet had a chance to run zoltan's ast dumper which will be necessary as i'm am fairly sure i've screwed up some of the line and character positions for exceptions, etc It also doesn't have any real stack guards -- there's a stack checker there at the moment but i'm basically just using that to judge stack usage so it simply crashes out if the arbitrary limit is reached. So with those caveats, i figure i should point out the goodness: i believe this is much more understandable and maintainable than the iterative parser -- it is also much less code, but not by as much as the patch sizes implies -- the iterative parser includes a significant amount of comments. It is currently ~5% slower than the iterative parser, but is still much faster than bison -- there may be some trivial improvements that can be made but i'm hesistant to make claims like that as I don't believe Zoltan has done any real perf optimisations to the iterative parser either (is this correct?). Anyhoo, I just figured i'd put this up in response to the "how long will it take?" question. Answer: 1 weekend :-D --Oliver
Zoltan Herczeg
Comment 81 2010-05-24 02:24:07 PDT
Created attachment 56864 [details] latest patch Unfortunately I submitted a wrong patch into bugzilla (version 3 instead of version 4.3).
Zoltan Herczeg
Comment 82 2010-05-24 02:30:27 PDT
> I don't believe Zoltan has done any real perf optimisations to the iterative parser either (is this correct?). True. My parser even mimics bugs and other things bison have. The focus of my patch was compatibility, and make it possible to use heavy optimizations after the switch.
Oliver Hunt
Comment 83 2010-05-24 03:00:50 PDT
(In reply to comment #82) > > I don't believe Zoltan has done any real perf optimisations to the iterative parser either (is this correct?). > > True. My parser even mimics bugs and other things bison have. The focus of my patch was compatibility, and make it possible to use heavy optimizations after the switch. I agree -- there's so much that can be made much better both, but whichever parser we end up using we probably want to basically leave alone for a few weeks to settle in the nightlies before we start making changes to the AST. I'm trapped in meetings from hell for the next two days so i'll probably do some more tidying of my parsing code during them and will post any significant updates. --Oliver
Akos Kiss
Comment 84 2010-05-24 09:33:59 PDT
(In reply to comment #80) > Anyhoo, I just figured i'd put this up in response to the "how long will it take?" question. Answer: 1 weekend :-D Aye, but that's not fair... it's incomplete... ;) -Akos
Oliver Hunt
Comment 85 2010-05-28 23:25:54 PDT
Created attachment 57405 [details] More complete parser Okay this now passes everything except one test in fast/js that is actually incorrect (the automatic semicolon insertion in yacc is incorrect -- i'm unsure what zoltan's patch does for that test). Still has issues (ignoring perf) it uses more stack space than i would like, though it's stack usage seems comparable to spidermonkey + v8 (quite a bit better than ffx for the most part, on par with v8's). The stack guard is not thread safe for no reason other than me being a slacker. I've also realised that the parsing of for-loops is unnecessarily obtuse. Once done with those i'm going to go through and fix up the rest of the squirrelly code and then see what can be done to make it nicer.
Darin Adler
Comment 86 2010-05-29 21:04:47 PDT
Comment on attachment 57405 [details] More complete parser > +#define parseError() do { m_error = true; return 0; } while (0) > +#define failIfFalse(cond) do { if (!(cond)) parseError(); } while(0) > +#define failIfTrue(cond) do { if ((cond)) parseError(); } while(0) > +#define assertMatch(tok) ASSERT(match(tok)) > +#define checkStackLimit() do { failIfFalse(allowRecursion()); } while(0) Why can't these all be inline member functions instead of macros? > +// Macros to make the more common TreeBuilder types a little less verbose > +#define TreeStatement typename TreeBuilder::Statement > +#define TreeExpression typename TreeBuilder::Expression > +#define TreeFormalParameterList typename TreeBuilder::FormalParameterList > +#define TreeSourceElements typename TreeBuilder::SourceElements Why can't these be typedefs instead of macros?
Oliver Hunt
Comment 87 2010-05-29 23:44:06 PDT
(In reply to comment #86) > (From update of attachment 57405 [details]) > > +#define parseError() do { m_error = true; return 0; } while (0) > > +#define failIfFalse(cond) do { if (!(cond)) parseError(); } while(0) > > +#define failIfTrue(cond) do { if ((cond)) parseError(); } while(0) > > +#define assertMatch(tok) ASSERT(match(tok)) > > +#define checkStackLimit() do { failIfFalse(allowRecursion()); } while(0) > > Why can't these all be inline member functions instead of macros? Because they return from the containing function, if they were functions they couldn't do this. > > +// Macros to make the more common TreeBuilder types a little less verbose > > +#define TreeStatement typename TreeBuilder::Statement > > +#define TreeExpression typename TreeBuilder::Expression > > +#define TreeFormalParameterList typename TreeBuilder::FormalParameterList > > +#define TreeSourceElements typename TreeBuilder::SourceElements > > Why can't these be typedefs instead of macros? Because they don't have a single context in which they could be typedef'd -- TreeBuilder is a parameter passed to the parsing functions, not the parser itself.
Oliver Hunt
Comment 88 2010-06-06 13:27:46 PDT
Created attachment 57981 [details] Updated parser This parser is both speedy and complete -- it passes all tests, with the exception of the incorrect semicolon insertion test, and can run in very little stack space -- it currently uses stack guards that match the limit used by spider monkey and v8. That said it has a real problem currently in that lazy and non-lazy parsing use different amounts of stack space, so it's possible to construct examples where a function body can be parsed lazily, but not in the actual "i'd like an AST" pass. I'm unsure how to deal with this. The obvious solution (making it iterative) seem highly likely to lead to the kind of convoluted code I was aiming to avoid in the first case with this patch. Alternatively we could restrict the depth of the lazy parsing more than the full parse, but i am fairly sure that it would still be possible to induce badness even in those cases. Another alternative would be to handle reparsing failures and produce a stack overflow, but we also need to reparse in response to an exception, so if the stack is exhausted when we go to reparse once again we could end up with badness. WWDC week is upon us once more so i may be able to spend some time looking into solutions over the course of the week. I'm wondering if it's possible to separate statement and expression parsing, and have two iterative parsers for less complexity than having a single merged parser. You'd still recurse, but you'd have bound recursion -- basically statement can call the expression parser, but can't recurse into the statement parser, and the expression parser would have to be completely iterative. I'm really not sure. /me shrugs
Oliver Hunt
Comment 89 2010-06-06 13:33:24 PDT
(In reply to comment #88) > Created an attachment (id=57981) [details] > Updated parser Basic changes in this version of the parser is to make assignment chains iterative, and remove the large number of arguments to most of the functions in order to bring a measure of sanity things. It can now get through run-jsc-tests with 4k of stack space in 64bit i think it was.
Zoltan Herczeg
Comment 90 2010-06-17 05:00:08 PDT
Created attachment 58980 [details] faster parser After a long silence, I decided to improve my parser a little because the Qt guys are interested in it: https://lists.webkit.org/pipermail/webkit-qt/2010-June/000632.html This is a great news for me, and would like to thank them again. Practically I reordered some parts of the parser, and improved the switches a little. Seems "prototype" liked it. TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: 1.033x as fast 34.4ms +/- 2.0% 33.3ms +/- 1.8% significant ============================================================================= jquery: - 5.3ms +/- 6.5% 5.2ms +/- 5.8% 1.3.2: - 5.3ms +/- 6.5% 5.2ms +/- 5.8% mootools: ?? 5.3ms +/- 6.5% 5.4ms +/- 6.8% not conclusive: might be *1.019x as slow* 1.2.2-core-nc: ?? 5.3ms +/- 6.5% 5.4ms +/- 6.8% not conclusive: might be *1.019x as slow* prototype: 1.097x as fast 6.8ms +/- 4.4% 6.2ms +/- 4.9% significant 1.6.0.3: 1.097x as fast 6.8ms +/- 4.4% 6.2ms +/- 4.9% significant concat: 1.030x as fast 17.0ms +/- 0.0% 16.5ms +/- 2.3% significant jquery-mootools-prototype: 1.030x as fast 17.0ms +/- 0.0% 16.5ms +/- 2.3% significant
Oliver Hunt
Comment 91 2010-06-17 09:50:33 PDT
(In reply to comment #90) > Created an attachment (id=58980) [details] > faster parser > > After a long silence, I decided to improve my parser a little because the Qt guys are interested in it: > https://lists.webkit.org/pipermail/webkit-qt/2010-June/000632.html > > This is a great news for me, and would like to thank them again. > > Practically I reordered some parts of the parser, and improved the switches a little. Seems "prototype" liked it. The issue with the iterative parser isn't speed -- it was already very fast, the issue is maintainability, the code is very difficult to follow and it's hard to see how that can be improved. I'll attach a slightly improved version of my parser in a bit, but i have been distracted by other stuff recently :-/
Zoltan Herczeg
Comment 92 2010-06-17 23:44:32 PDT
> The issue with the iterative parser isn't speed -- it was already very fast, the issue is maintainability, the code is very difficult to follow and it's hard to see how that can be improved. I think my code is not that bad as the number of "maintability" words imply. It is a state machine, kind of a vertical solution if your code is the horizontal. Anyway, although we have a lot of guys here who is an expert on code quality, I never heard that there is a high correlation between code quality and successful projects.
Geoffrey Garen
Comment 93 2010-06-18 12:30:01 PDT
(In reply to comment #92) > I never heard that there is a high correlation between code quality and successful projects. Code quality is an explicitly state goal of the WebKit project. (See http://webkit.org/projects/goals.html.) If you believe that the WebKit project should change its goals to exclude code quality, please bring the issue up on webkit-dev. For the purposes of this bug, though, I propose working within the WebKit project goals as they currently stand.
Akos Kiss
Comment 94 2010-06-18 12:48:00 PDT
(In reply to comment #93) > (In reply to comment #92) > > I never heard that there is a high correlation between code quality and successful projects. > > Code quality is an explicitly state goal of the WebKit project. (See http://webkit.org/projects/goals.html.) If you believe that the WebKit project should change its goals to exclude code quality, please bring the issue up on webkit-dev. > > For the purposes of this bug, though, I propose working within the WebKit project goals as they currently stand. I do think that it's been an unfortunate comment only. Certainly, we don't want to lower but to increase the quality of the code of WebKit. Where we not necessarily agree is whether Zoltan's version is more _complex_ and/or less _maintainable_ than Oliver's implementation. But hopefully both pathes are of high _quality_. Thus, should any of those be selected for landing, WebKit can only benefit from it.
Zoltan Herczeg
Comment 95 2010-06-18 14:16:30 PDT
> Code quality is an explicitly state goal of the WebKit project. (See http://webkit.org/projects/goals.html.) If you believe that the WebKit project should change its goals to exclude code quality, please bring the issue up on webkit-dev. > > For the purposes of this bug, though, I propose working within the WebKit project goals as they currently stand. Ok, that comment was too much, I know, but keep hearing this "unmaintainable" mantra makes me feel that I have no clue of software developing. I still think that my work is not that bad, and you could help me to improve it.
Oliver Hunt
Comment 96 2010-06-23 18:17:12 PDT
WebKit Review Bot
Comment 97 2010-06-23 18:23:01 PDT
Attachment 59594 [details] did not pass style-queue: Failed to run "['WebKitTools/Scripts/check-webkit-style', '--no-squash']" exit_code: 1 Last 3072 characters of output: itespace/indent] [4] JavaScriptCore/parser/JSParser.cpp:1054: Place brace on its own line for function definitions. [whitespace/braces] [4] JavaScriptCore/parser/JSParser.cpp:1056: A case label should not be indented, but line up with its switch statement. [whitespace/indent] [4] JavaScriptCore/parser/JSParser.cpp:1157: A case label should not be indented, but line up with its switch statement. [whitespace/indent] [4] JavaScriptCore/parser/JSParser.cpp:1165: More than one command on the same line [whitespace/newline] [4] JavaScriptCore/parser/JSParser.cpp:1208: One space before end of line comments [whitespace/comments] [5] JavaScriptCore/parser/JSParser.cpp:1208: Should have a space between // and comment [whitespace/comments] [4] JavaScriptCore/parser/JSParser.cpp:1267: A case label should not be indented, but line up with its switch statement. [whitespace/indent] [4] JavaScriptCore/parser/JSParser.cpp:1319: One line control clauses should not use braces. [whitespace/braces] [4] JavaScriptCore/parser/JSParser.cpp:1378: A case label should not be indented, but line up with its switch statement. [whitespace/indent] [4] JavaScriptCore/parser/JSParser.cpp:1442: A case label should not be indented, but line up with its switch statement. [whitespace/indent] [4] JavaScriptCore/parser/JSParser.cpp:1463: A case label should not be indented, but line up with its switch statement. [whitespace/indent] [4] JavaScriptCore/parser/ASTBuilder.h:37: Missing space before { [whitespace/braces] [5] JavaScriptCore/parser/ASTBuilder.h:60: Missing space before { [whitespace/braces] [5] JavaScriptCore/parser/ASTBuilder.h:134: More than one command on the same line [whitespace/newline] [4] JavaScriptCore/parser/ASTBuilder.h:140: Place brace on its own line for function definitions. [whitespace/braces] [4] JavaScriptCore/parser/ASTBuilder.h:759: A case label should not be indented, but line up with its switch statement. [whitespace/indent] [4] JavaScriptCore/parser/ASTBuilder.h:846: An else statement can be removed when the prior "if" concludes with a return, break, continue or goto statement. [readability/control_flow] [4] JavaScriptCore/parser/ASTBuilder.h:855: An else statement can be removed when the prior "if" concludes with a return, break, continue or goto statement. [readability/control_flow] [4] JavaScriptCore/parser/Lexer.cpp:29: Alphabetical sorting problem. [build/include_order] [4] JavaScriptCore/parser/JSParser.h:110: first_line is incorrectly named. Don't use underscores in your identifier names. [readability/naming] [4] JavaScriptCore/parser/JSParser.h:111: last_line is incorrectly named. Don't use underscores in your identifier names. [readability/naming] [4] JavaScriptCore/parser/JSParser.h:112: first_column is incorrectly named. Don't use underscores in your identifier names. [readability/naming] [4] JavaScriptCore/parser/JSParser.h:113: last_column is incorrectly named. Don't use underscores in your identifier names. [readability/naming] [4] Total errors found: 38 in 17 files If any of these errors are false positives, please file a bug against check-webkit-style.
Oliver Hunt
Comment 98 2010-06-23 19:11:43 PDT
WebKit Review Bot
Comment 99 2010-06-23 19:15:14 PDT
Attachment 59603 [details] did not pass style-queue: Failed to run "['WebKitTools/Scripts/check-webkit-style', '--no-squash']" exit_code: 1 JavaScriptCore/parser/SyntaxChecker.h:148: More than one command on the same line [whitespace/newline] [4] JavaScriptCore/parser/SyntaxChecker.h:149: More than one command on the same line [whitespace/newline] [4] JavaScriptCore/parser/SyntaxChecker.h:153: More than one command on the same line [whitespace/newline] [4] JavaScriptCore/parser/JSParser.cpp:35: Alphabetical sorting problem. [build/include_order] [4] JavaScriptCore/parser/JSParser.cpp:36: Alphabetical sorting problem. [build/include_order] [4] JavaScriptCore/parser/Lexer.cpp:29: Alphabetical sorting problem. [build/include_order] [4] JavaScriptCore/parser/JSParser.h:110: first_line is incorrectly named. Don't use underscores in your identifier names. [readability/naming] [4] JavaScriptCore/parser/JSParser.h:111: last_line is incorrectly named. Don't use underscores in your identifier names. [readability/naming] [4] JavaScriptCore/parser/JSParser.h:112: first_column is incorrectly named. Don't use underscores in your identifier names. [readability/naming] [4] JavaScriptCore/parser/JSParser.h:113: last_column is incorrectly named. Don't use underscores in your identifier names. [readability/naming] [4] Total errors found: 10 in 17 files If any of these errors are false positives, please file a bug against check-webkit-style.
Gavin Barraclough
Comment 100 2010-06-23 21:12:19 PDT
Comment on attachment 59603 [details] Patch comparing Identifer == char* in createGetterOrSetterProperty makes me a little sad. bool consume(int expected) This looks a little screwy? if (!result) { m_error = result; is this what you really mean here? - if so, it's probably clearer written as: if (!result) { m_error = false; or is this inverted? - if so, I think you want if (!result) fail(); ? 202 }; 203 int jsParse(JSGlobalData* globalData) pls to be add a newline here. 229 } 230 231 232 bool JSParser::parseProgram() pls to be remove a newline here. ;-D 597 consumeOrFail(CLOSEBRACE); 598 return context.createSwitchStatement(expr, firstClauses, defaultClause, secondClauses, startLine, endLine); 599 600 } and again. #define ENABLE_RECURSIVE_PARSE 1 will this go away completely in your next patch? - I strongly hope so. r+, please please please to be removing the old parser asap.
Oliver Hunt
Comment 101 2010-06-23 21:19:48 PDT
WebKit Review Bot
Comment 102 2010-06-23 21:44:28 PDT
http://trac.webkit.org/changeset/61732 might have broken Qt Linux Release
Eric Seidel (no email)
Comment 103 2010-06-23 23:08:11 PDT
Seems to have borked lots of bots. :(
Csaba Osztrogonác
Comment 104 2010-06-24 06:25:33 PDT
(In reply to comment #103) > Seems to have borked lots of bots. :( Tiger is still dead :(( http://build.webkit.org/builders/Tiger%20Intel%20Release/builds/13588
Note You need to log in before you can comment on or make changes to this bug.