RESOLVED FIXED 33330
Web Inspector: Regex-based syntax highlighting is slow.
https://bugs.webkit.org/show_bug.cgi?id=33330
Summary Web Inspector: Regex-based syntax highlighting is slow.
Pavel Feldman
Reported 2010-01-07 10:53:50 PST
As of today, we are executing too many regexes against too many substrings in highlighter. Every time regex does not match, we lose all the intermediate calculations and move to another regex. So if "String" did not match, we will try "StringStart" although they are very similar. Instead of doing regexes, we should try implementing more efficient lexer. Unfortunately, there are no good parser generators for JavaScript yet, I tried a bunch of them. There is also no good grammar for ECMAScript. So I ended up using re2c (fast regex-based lexer generator) post-processed with a handful of sed scripts to convert from C to JavaScript. It gave me >10x performance boost on highlighting 30K lines file. The footprint of the generated lexer is almost 100K though which is 10 times more than the source. Patch will follow shortly (it will be based on text editor from 33001 that is not yet landed).
Attachments
[PATCH] Proposed change (122.81 KB, patch)
2010-01-07 11:04 PST, Pavel Feldman
timothy: review+
Pavel Feldman
Comment 1 2010-01-07 11:04:04 PST
Created attachment 46065 [details] [PATCH] Proposed change
Timothy Hatcher
Comment 2 2010-01-07 11:56:54 PST
Comment on attachment 46065 [details] [PATCH] Proposed change > + this._tokens = []; Tab. > + this._conditions = { > + DIV: 0, > + NODIV: 1, > + COMMENT: 2, > + DSTRING: 3, > + SSTRING: 4, > + REGEX: 5 > + }; > + > + this.case_DIV = 1000; > + this.case_NODIV = 1001; > + this.case_COMMENT = 1002; > + this.case_DSTRING = 1003; > + this.case_SSTRING = 1004; > + this.case_REGEX = 1005; > +} Can we camelcase these? I suspect not. > +WebInspector.JavaScriptTokenizer.prototype = { > + > + setLine: function(line) { Remove extra line. Getter and setter? > + getCondition: function() > + { > + return this._condition; > + }, > + > + setCondition: function(condition) > + { > + this._condition = condition; > + }, Getter and setter? > + while (1) { > + switch (goto) > + /*!re2c > + re2c:define:YYCTYPE = "var"; Why is two space indented? You should add a JS comment before this like "Start of Autogenerated stuff.." (better wording.)
Pavel Feldman
Comment 3 2010-01-08 03:51:17 PST
(In reply to comment #2) > Tab. > Fixed. > > > Can we camelcase these? I suspect not. > In theory, we could. But we need to distinguish between rules (camel case) and conditions (all caps). In most re2c I've seen all caps for conditions. > > > +WebInspector.JavaScriptTokenizer.prototype = { > > + > > + setLine: function(line) { > > Remove extra line. Getter and setter? > Done. > > > + getCondition: function() > > + { > > + return this._condition; > > + }, > > + > > + setCondition: function(condition) > > + { > > + this._condition = condition; > > + }, > > Getter and setter? > Generated code is calling into these. It is using a C function template for that. > > > + while (1) { > > + switch (goto) > > + /*!re2c > > + re2c:define:YYCTYPE = "var"; > > Why is two space indented? You should add a JS comment before this like "Start > of Autogenerated stuff.." (better wording.) Done. Committing to http://svn.webkit.org/repository/webkit/trunk ... D WebCore/inspector/front-end/JavaScriptHighlighterScheme.js M WebCore/ChangeLog M WebCore/WebCore.gypi M WebCore/WebCore.vcproj/WebCore.vcproj A WebCore/inspector/front-end/JavaScriptTokenizer.js A WebCore/inspector/front-end/JavaScriptTokenizer.re2js M WebCore/inspector/front-end/TextEditorHighlighter.js M WebCore/inspector/front-end/WebKit.qrc M WebCore/inspector/front-end/inspector.html Committed r52985
Note You need to log in before you can comment on or make changes to this bug.