Bug 33330 - Web Inspector: Regex-based syntax highlighting is slow.
Summary: Web Inspector: Regex-based syntax highlighting is slow.
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Web Inspector (Deprecated) (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Pavel Feldman
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-01-07 10:53 PST by Pavel Feldman
Modified: 2010-01-08 03:51 PST (History)
4 users (show)

See Also:


Attachments
[PATCH] Proposed change (122.81 KB, patch)
2010-01-07 11:04 PST, Pavel Feldman
timothy: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Pavel Feldman 2010-01-07 10:53:50 PST
As of today, we are executing too many regexes against too many substrings in highlighter. Every time regex does not match, we lose all the intermediate calculations and move to another regex. So if "String" did not match, we will try "StringStart" although they are very similar. Instead of doing regexes, we should try implementing more efficient lexer.

Unfortunately, there are no good parser generators for JavaScript yet, I tried a bunch of them. There is also no good grammar for ECMAScript. So I ended up using re2c (fast regex-based lexer generator) post-processed with a handful of sed scripts to convert from C to JavaScript.

It gave me >10x performance boost on highlighting 30K lines file. The footprint of the generated lexer is almost 100K though which is 10 times more than the source. Patch will follow shortly (it will be based on text editor from 33001 that is not yet landed).
Comment 1 Pavel Feldman 2010-01-07 11:04:04 PST
Created attachment 46065 [details]
[PATCH] Proposed change
Comment 2 Timothy Hatcher 2010-01-07 11:56:54 PST
Comment on attachment 46065 [details]
[PATCH] Proposed change


> +	this._tokens = [];

Tab.


> +    this._conditions = {
> +        DIV: 0,
> +        NODIV: 1,
> +        COMMENT: 2,
> +        DSTRING: 3,
> +        SSTRING: 4,
> +        REGEX: 5
> +    };
> +
> +    this.case_DIV = 1000;
> +    this.case_NODIV = 1001;
> +    this.case_COMMENT = 1002;
> +    this.case_DSTRING = 1003;
> +    this.case_SSTRING = 1004;
> +    this.case_REGEX = 1005;
> +}

Can we camelcase these? I suspect not.


> +WebInspector.JavaScriptTokenizer.prototype = {
> +
> +    setLine: function(line) {

Remove extra line. Getter and setter?


> +    getCondition: function()
> +    {
> +        return this._condition;
> +    },
> +
> +    setCondition: function(condition)
> +    {
> +        this._condition = condition;
> +    },

Getter and setter?


> +        while (1) {
> +            switch (goto)
> +            /*!re2c
> +                re2c:define:YYCTYPE  = "var";

Why is two space indented? You should add a JS comment before this like "Start of Autogenerated stuff.." (better wording.)
Comment 3 Pavel Feldman 2010-01-08 03:51:17 PST
(In reply to comment #2)

> Tab.
> 

Fixed.
> 
> 
> Can we camelcase these? I suspect not.
> 

In theory, we could. But we need to distinguish between rules (camel case) and conditions (all caps). In most re2c I've seen all caps for conditions.
> 
> > +WebInspector.JavaScriptTokenizer.prototype = {
> > +
> > +    setLine: function(line) {
> 
> Remove extra line. Getter and setter?
> 

Done.

> 
> > +    getCondition: function()
> > +    {
> > +        return this._condition;
> > +    },
> > +
> > +    setCondition: function(condition)
> > +    {
> > +        this._condition = condition;
> > +    },
> 
> Getter and setter?
> 

Generated code is calling into these. It is using a C function template for that.

> 
> > +        while (1) {
> > +            switch (goto)
> > +            /*!re2c
> > +                re2c:define:YYCTYPE  = "var";
> 
> Why is two space indented? You should add a JS comment before this like "Start
> of Autogenerated stuff.." (better wording.)

Done.

Committing to http://svn.webkit.org/repository/webkit/trunk ...
	D	WebCore/inspector/front-end/JavaScriptHighlighterScheme.js
	M	WebCore/ChangeLog
	M	WebCore/WebCore.gypi
	M	WebCore/WebCore.vcproj/WebCore.vcproj
	A	WebCore/inspector/front-end/JavaScriptTokenizer.js
	A	WebCore/inspector/front-end/JavaScriptTokenizer.re2js
	M	WebCore/inspector/front-end/TextEditorHighlighter.js
	M	WebCore/inspector/front-end/WebKit.qrc
	M	WebCore/inspector/front-end/inspector.html
Committed r52985