Bug 170099

Summary: Web Inspector: Add regular expression support to XHR breakpoints
Product: WebKit Reporter: Matt Baker <mattbaker>
Component: Web InspectorAssignee: Matt Baker <mattbaker>
Status: RESOLVED FIXED    
Severity: Normal CC: bburg, buildbot, commit-queue, hi, inspector-bugzilla-changes, joepeck, keith_miller, mark.lam, msaboff, saam, webkit-bug-importer
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: All   
OS: All   
Attachments:
Description Flags
[Image] Popover with new regex option
none
[Image] Tree element w/ regex subtitle
none
[Image] CodeMirror mode for MIME type "text/x-regex"
none
Patch
none
[Image] improved character set tokenizer
none
[Image] improved subexpression nesting
none
Patch
none
Patch
none
Patch
none
[Image] improved character set tokenizer 2 none

Description Matt Baker 2017-03-25 09:36:32 PDT
Summary:
Add regular expression support to XHR breakpoints.

Tree element titles for XHR breakpoints should be as follows:

URL – "foo"
URL – /foo\d/

The popover should present this option as a <select> picker preceding the input field, with options "Contains" and "Matches".
Comment 1 Radar WebKit Bug Importer 2017-04-11 09:39:57 PDT
<rdar://problem/31558082>
Comment 2 Matt Baker 2017-04-17 22:45:00 PDT
Created attachment 307346 [details]
[Image] Popover with new regex option
Comment 3 Matt Baker 2017-04-17 22:46:19 PDT
Created attachment 307347 [details]
[Image] Tree element w/ regex subtitle
Comment 4 Matt Baker 2017-04-17 22:48:41 PDT
Created attachment 307348 [details]
[Image] CodeMirror mode for MIME type "text/x-regex"

The colors are (loosely) based off TextMate's regex find-text field.
Comment 5 Matt Baker 2017-04-17 23:13:48 PDT
Created attachment 307349 [details]
Patch
Comment 6 Joseph Pecoraro 2017-04-18 01:14:53 PDT
Comment on attachment 307349 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=307349&action=review

Cool! Some quick comments tonight on the regex syntax highlighting.

> Source/WebInspectorUI/UserInterface/Views/CodeMirrorRegexMode.js:32
> +            if (stream.match(/[dDwWsStrnvf0]/))
> +                return "regex-special";

You'll need to add [bB] to your list for word boundaries:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp#boundaries

I notice this handles \0 (NUL) but you don't have anything special for the \1-\9 backreferences.
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp#grouping-back-references

I think you probably want a special category, or just regex-special.

    if (stream.match(/[1-9]/))
        return "regex-backreference";

> Source/WebInspectorUI/UserInterface/Views/CodeMirrorRegexMode.js:61
> +        if (ch === "(") {
> +            stream.match(/\?:/);
> +            return "regex-group";

I see this groups /(abc)/ and non-captured groups /(?:abc/)/.

You may want special productions for positive lookahead /(?=abc)/ and negative lookahead /(?!=abc)/:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp#assertions

> Source/WebInspectorUI/UserInterface/Views/CodeMirrorRegexMode.js:68
> +        if (/[\*\+\?]/.test(ch))

The escapes are not necessary inside character sets. I'll leave the style up to you.

> Source/WebInspectorUI/UserInterface/Views/CodeMirrorRegexMode.js:89
> +        if (ch === "[") {
> +            stream.match(/.*]/);

This doesn't properly handle a character set with a ']' character inside it:

    /before[abc\]def]after/

Probably not very common.

Also, I see a lot of editors specially handle the negation case [^abc] by highlighting the "^" specially. Maybe we could do something there.

> Source/WebInspectorUI/UserInterface/Views/CodeMirrorRegexMode.js:94
> +        if (/[\.\^\$\|]/.test(ch))

Same thing here. Don't need to escape in the character set (as long as the ^ is not the first!). But I'll leave that style up to you.
Comment 7 Joseph Pecoraro 2017-04-18 01:28:41 PDT
Comment on attachment 307349 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=307349&action=review

I haven't completed review but I got carried away looking at it more tonight. This looks great!

> Source/JavaScriptCore/inspector/protocol/DOMDebugger.json:62
> +                { "name": "isRegEx", "type": "boolean", "optional": true, "descriprion": "Whether the URL string is a regular expression." }

We have existing spelling in the protocol of: "isRegex" for a boolean and "regexp" for RemoteObject subtype. Lets be consistent here with "isRegex"

> Source/WebCore/inspector/InspectorDOMDebuggerAgent.h:103
> -    HashSet<String> m_xhrBreakpoints;
> +    HashMap<String, bool> m_xhrBreakpoints;

An enum would be much clearer than a boolean.

> Source/WebInspectorUI/UserInterface/Models/XHRBreakpoint.js:81
> +    Text: "xhr-breakpoint-type-text",
> +    RegularExpression: "xhr-breakpoint-type-regular-expression",

Since we serialize these I wonder if shorter is better. Just "text" and "regex".

> Source/WebInspectorUI/UserInterface/Views/CodeMirrorRegexMode.css:2
> + * Copyright (C) 2013 Apple Inc. All rights reserved.

2017.

> Source/WebInspectorUI/UserInterface/Views/XHRBreakpointPopover.js:42
> +

Style: // Public

> Source/WebInspectorUI/UserInterface/Views/XHRBreakpointPopover.js:73
> +        addOption(WebInspector.UIString("Containing"), WebInspector.XHRBreakpoint.Type.Text);
> +        addOption(WebInspector.UIString("Matching"), WebInspector.XHRBreakpoint.Type.RegularExpression);

I think something in the UI must make it clear that in the matching case the user is inputting a regular expression.

> Source/WebInspectorUI/UserInterface/Views/XHRBreakpointPopover.js:89
> +

Style: // Private

> Source/WebInspectorUI/UserInterface/Views/XHRBreakpointTreeElement.js:41
> +            if (breakpoint.type === WebInspector.XHRBreakpoint.Type.Text)
> +                subtitle = doubleQuotedString(breakpoint.url);
> +            else
> +                subtitle = "/" + breakpoint.url + "/";

I wonder if we should color regex the usual /orange/ and strings the usual "red" like they appear in the console (using the formatted value colors). Might be too distracting?
Comment 8 Matt Baker 2017-04-18 14:53:50 PDT
Comment on attachment 307349 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=307349&action=review

>> Source/WebInspectorUI/UserInterface/Views/CodeMirrorRegexMode.js:32
>> +                return "regex-special";
> 
> You'll need to add [bB] to your list for word boundaries:
> https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp#boundaries
> 
> I notice this handles \0 (NUL) but you don't have anything special for the \1-\9 backreferences.
> https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp#grouping-back-references
> 
> I think you probably want a special category, or just regex-special.
> 
>     if (stream.match(/[1-9]/))
>         return "regex-backreference";

Will add "regex-backreference", and style them identically to regex-special (for now).

>> Source/WebInspectorUI/UserInterface/Views/CodeMirrorRegexMode.js:61
>> +            return "regex-group";
> 
> I see this groups /(abc)/ and non-captured groups /(?:abc/)/.
> 
> You may want special productions for positive lookahead /(?=abc)/ and negative lookahead /(?!=abc)/:
> https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp#assertions

Will add "regex-lookaround" for these two cases, and color them identically to regex-group (for now).

>> Source/WebInspectorUI/UserInterface/Views/XHRBreakpointPopover.js:73
>> +        addOption(WebInspector.UIString("Matching"), WebInspector.XHRBreakpoint.Type.RegularExpression);
> 
> I think something in the UI must make it clear that in the matching case the user is inputting a regular expression.

The input field placeholder text makes this clear when the dialog appears:

Break on request with URL:
[Containing]  Text__________________
[Matching  ]  Regular Expression__

>> Source/WebInspectorUI/UserInterface/Views/XHRBreakpointTreeElement.js:41
>> +                subtitle = "/" + breakpoint.url + "/";
> 
> I wonder if we should color regex the usual /orange/ and strings the usual "red" like they appear in the console (using the formatted value colors). Might be too distracting?

I tried this early on and found it pretty distracting.
Comment 9 Matt Baker 2017-04-19 12:29:48 PDT
Created attachment 307497 [details]
[Image] improved character set tokenizer

The character set tokenizer has been improved, and now supports the following close bracket scenarios:

[]]
[^]]
[abc\]def]
Comment 10 Matt Baker 2017-04-19 12:33:38 PDT
And it produces a new "regex-character-set-negate" style token for a leading "^". So the following regular expression:

[^abc]

will produce the token stream:

[regex-character-set, regex-character-set-negate, regex-character-set]
Comment 11 Matt Baker 2017-04-19 12:36:57 PDT
Created attachment 307498 [details]
[Image] improved subexpression nesting

Nested capturing/non-capturing groups and lookheads are handled by a context stack, so they get styled correctly too.
Comment 12 Matt Baker 2017-04-19 13:17:36 PDT
Created attachment 307502 [details]
Patch
Comment 13 Joseph Pecoraro 2017-04-19 15:18:50 PDT
Comment on attachment 307502 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=307502&action=review

r=me

> Source/WebCore/inspector/InspectorDOMDebuggerAgent.cpp:412
> +            int matchesCount = ContentSearchUtilities::countRegularExpressionMatches(regex, url);
> +            if (matchesCount) {

We can just check for a single match and not run the regex multiple times:

    bool hasMatch = regex.match(url) != -1;
    if (hasMatch) {

> Source/WebCore/inspector/InspectorDOMDebuggerAgent.h:109
> +    enum XHRBreakpointType {
> +        Text = 0,
> +        RegularExpression,
> +    };
> +
> +    HashMap<String, XHRBreakpointType> m_xhrBreakpoints;

Nit: enum class (I'm pretty sure that works here, you might need to give it a type tho) and you can just make it one line.

> Source/WebInspectorUI/UserInterface/Views/CodeMirrorRegexMode.css:39
> +    color: var(--text-color-gray-medium);

Maybe create a new syntax-highlight-regex-group-color?

> Source/WebInspectorUI/UserInterface/Views/CodeMirrorRegexMode.js:36
> +        // An unescaped closing bracket is allowed if it immediately follows the
> +        // opening bracket, or the negating caret.

I mis-interpreted "an unescaped closing bracket is allowed" to mean this would be the equivalent of a closing bracket. I'd suggest rewriting this to something like:

    // Check for early closing bracket to close the character set.
    // We are guaranteed not to have an escape character this early.

> Source/WebInspectorUI/UserInterface/Views/CodeMirrorRegexMode.js:83
> +        if (stream.next())
> +            return "regex-literal";

We should return an error if there is no character left. So after this I'd expect a return "error".

    js> new RegExp("\\")
    Exception: SyntaxError: Invalid regular expression: \ at end of pattern

> Source/WebInspectorUI/UserInterface/Views/XHRBreakpointTreeElement.js:41
> +                subtitle = "/" + breakpoint.url + "/";

Hmm, if we show it with /s we might want to consider escaping /s in the regex, which are likely to be common in regexs. But I suspect this won't be a problem so I think keeping it as is is fine for now.
Comment 14 Joseph Pecoraro 2017-04-19 15:22:58 PDT
(In reply to Matt Baker from comment #9)
> Created attachment 307497 [details]
> [Image] improved character set tokenizer
> 
> The character set tokenizer has been improved, and now supports the
> following close bracket scenarios:
> 
> []]

I can't find anything it matches with. In fact:

    /[]a/.test("a") // false
    /[]a/.test("[]a") // false

I have no idea what happens when there is a "[]" in a regex.
Comment 15 Joseph Pecoraro 2017-04-19 15:51:22 PDT
Comment on attachment 307502 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=307502&action=review

> Source/WebInspectorUI/UserInterface/Views/CodeMirrorRegexMode.js:44
> +            if (stream.eat("\\")) {
> +                stream.next();
> +                return "regex-character-set";
> +            }

Also, apparently \w \d and friends are still special in character sets: [\d] === [0-9]

So you could give those a regex-special appearance.
Comment 16 Matt Baker 2017-04-19 20:36:25 PDT
(In reply to Joseph Pecoraro from comment #14)
> (In reply to Matt Baker from comment #9)
> > Created attachment 307497 [details]
> > [Image] improved character set tokenizer
> > 
> > The character set tokenizer has been improved, and now supports the
> > following close bracket scenarios:
> > 
> > []]
> 
> I can't find anything it matches with. In fact:
> 
>     /[]a/.test("a") // false
>     /[]a/.test("[]a") // false
> 
> I have no idea what happens when there is a "[]" in a regex.

[]] is equivalent to [\]].
Comment 17 Matt Baker 2017-04-19 20:44:16 PDT
Comment on attachment 307502 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=307502&action=review

>> Source/WebInspectorUI/UserInterface/Views/CodeMirrorRegexMode.css:39
>> +    color: var(--text-color-gray-medium);
> 
> Maybe create a new syntax-highlight-regex-group-color?

Sounds good. Initially I considered doing a few --syntax-highlight-regex-*-color vars, but didn't want to get too carried away. This one makes sense to me, and we could add more later if needed.

>> Source/WebInspectorUI/UserInterface/Views/CodeMirrorRegexMode.js:44
>> +            }
> 
> Also, apparently \w \d and friends are still special in character sets: [\d] === [0-9]
> 
> So you could give those a regex-special appearance.

Currently regex-special is purple (--syntax-highlight-color-boolean), which is visually pretty close to red (regex-character-set) when the characters are adjacent.

I think it's still worthwhile producing more precise tokens, to make future customization easier.
Comment 18 Joseph Pecoraro 2017-04-19 22:21:02 PDT
> []] is equivalent to [\]].

That does not appear to be the case:

    /[]]/.test("]]]]]") // => false

In any case:

    []
    [^]

Are both valid, complete, character sets:
https://tc39.github.io/ecma262/#prod-CharacterClass

> CharacterClass::
>     '[' [lookahead ∉ { '^' }] ClassRanges ']'
>     '[' '^' ClassRanges ']'
> 
> ClassRanges[U]::
>    [empty]
>    NonemptyClassRanges

I just think the comment about this section was super weird.
Comment 19 Matt Baker 2017-04-20 02:05:04 PDT
Created attachment 307580 [details]
Patch
Comment 20 Matt Baker 2017-04-20 12:31:50 PDT
(In reply to Joseph Pecoraro from comment #18)
> > []] is equivalent to [\]].
> 
> That does not appear to be the case:
> 
>     /[]]/.test("]]]]]") // => false
> 
> In any case:
> 
>     []
>     [^]
> 
> Are both valid, complete, character sets:
> https://tc39.github.io/ecma262/#prod-CharacterClass

My mistake, I was looking at a PCRE reference. Will update character set tokenizer to match the spec.
Comment 21 Matt Baker 2017-04-20 12:54:33 PDT
Created attachment 307620 [details]
Patch
Comment 22 Matt Baker 2017-04-20 13:03:03 PDT
Created attachment 307621 [details]
[Image] improved character set tokenizer 2

Escape sequences inside character sets now have their own styling.
Comment 23 WebKit Commit Bot 2017-04-20 16:09:57 PDT
Comment on attachment 307620 [details]
Patch

Clearing flags on attachment: 307620

Committed r215584: <http://trac.webkit.org/changeset/215584>
Comment 24 WebKit Commit Bot 2017-04-20 16:09:59 PDT
All reviewed patches have been landed.  Closing bug.