Bug 176435 - Add support for RegExp named capture groups
Summary: Add support for RegExp named capture groups
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: Other
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Michael Saboff
URL:
Keywords: InRadar
Depends on:
Blocks: 178174 205783
  Show dependency treegraph
 
Reported: 2017-09-05 18:10 PDT by Michael Saboff
Modified: 2020-03-24 13:08 PDT (History)
9 users (show)

See Also:


Attachments
Patch (53.04 KB, patch)
2017-09-05 18:24 PDT, Michael Saboff
buildbot: commit-queue-
Details | Formatted Diff | Diff
Archive of layout-test-results from ews107 for mac-elcapitan-wk2 (1.58 MB, application/zip)
2017-09-05 19:55 PDT, Build Bot
no flags Details
Archive of layout-test-results from ews112 for mac-elcapitan (2.38 MB, application/zip)
2017-09-05 19:58 PDT, Build Bot
no flags Details
Archive of layout-test-results from ews101 for mac-elcapitan (1.55 MB, application/zip)
2017-09-05 20:02 PDT, Build Bot
no flags Details
Archive of layout-test-results from ews125 for ios-simulator-wk2 (1.41 MB, application/zip)
2017-09-05 20:16 PDT, Build Bot
no flags Details
Updated patch addressing review comments and test failures. (53.99 KB, patch)
2017-09-06 15:24 PDT, Michael Saboff
fpizlo: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Saboff 2017-09-05 18:10:10 PDT
This is to add support for the ECMAScript proposed Regular Expression named capture groups.  See https://github.com/tc39/proposal-regexp-named-groups for details.

For example, the RegExp /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/
when matched would create a match object result with properties
group.year, group.month and group.day in addition to the array like capture results.
Comment 1 Michael Saboff 2017-09-05 18:10:34 PDT
<rdar://problem/33183183>
Comment 2 Michael Saboff 2017-09-05 18:24:37 PDT
Created attachment 319965 [details]
Patch

The style checker errors are due to the checker looking at code that didn't change or new code following existing formatting.
Comment 3 Build Bot 2017-09-05 18:26:34 PDT
Attachment 319965 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/yarr/YarrSyntaxChecker.cpp:47:  Missing space inside { }.  [whitespace/braces] [5]
ERROR: Source/JavaScriptCore/yarr/YarrSyntaxChecker.cpp:51:  Missing space inside { }.  [whitespace/braces] [5]
ERROR: Source/JavaScriptCore/yarr/YarrPattern.cpp:528:  Should have only a single space after a punctuation in a comment.  [whitespace/comments] [5]
ERROR: Source/JavaScriptCore/yarr/YarrPattern.cpp:957:  One space before end of line comments  [whitespace/comments] [5]
ERROR: Source/JavaScriptCore/yarr/YarrPattern.cpp:958:  One space before end of line comments  [whitespace/comments] [5]
Total errors found: 5 in 21 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 4 Build Bot 2017-09-05 19:14:28 PDT
Comment on attachment 319965 [details]
Patch

Attachment 319965 [details] did not pass jsc-ews (mac):
Output: http://webkit-queues.webkit.org/results/4458091

New failing tests:
jsc-layout-tests.yaml/js/script-tests/string_replace_function.js.layout-ftl-no-cjit
mozilla-tests.yaml/ecma_2/RegExp/hex-001.js.mozilla-ftl-eager-no-cjit-validate-phases
mozilla-tests.yaml/ecma_2/RegExp/unicode-001.js.mozilla-ftl-eager-no-cjit-validate-phases
jsc-layout-tests.yaml/js/script-tests/regexp-in-and-foreach-handling.js.layout-ftl-eager-no-cjit
jsc-layout-tests.yaml/js/script-tests/string_replace_function.js.layout-no-llint
mozilla-tests.yaml/ecma_2/RegExp/hex-001.js.mozilla-dfg-eager-no-cjit-validate-phases
jsc-layout-tests.yaml/js/script-tests/string_replace_function.js.layout-ftl-eager-no-cjit
jsc-layout-tests.yaml/js/script-tests/regexp-in-and-foreach-handling.js.layout-dfg-eager-no-cjit
jsc-layout-tests.yaml/js/script-tests/string_replace_function.js.layout-dfg-eager-no-cjit
microbenchmarks/emscripten-cube2hash.js.ftl-eager-no-cjit-b3o1
mozilla-tests.yaml/ecma_2/RegExp/unicode-001.js.mozilla-dfg-eager-no-cjit-validate-phases
jsc-layout-tests.yaml/js/script-tests/string_replace_function.js.layout
stress/regexp-replace-proxy.js.ftl-eager-no-cjit
mozilla-tests.yaml/ecma_2/RegExp/exec-002.js.mozilla-ftl-eager-no-cjit-validate-phases
jsc-layout-tests.yaml/js/script-tests/string_replace_function.js.layout-no-cjit
jsc-layout-tests.yaml/js/script-tests/string_replace_function.js.layout-no-ftl
Comment 5 Build Bot 2017-09-05 19:55:26 PDT
Comment on attachment 319965 [details]
Patch

Attachment 319965 [details] did not pass mac-wk2-ews (mac-wk2):
Output: http://webkit-queues.webkit.org/results/4458230

New failing tests:
js/string_replace_function.html
js/dom/string-replace-3.html
Comment 6 Build Bot 2017-09-05 19:55:27 PDT
Created attachment 319971 [details]
Archive of layout-test-results from ews107 for mac-elcapitan-wk2

The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: ews107  Port: mac-elcapitan-wk2  Platform: Mac OS X 10.11.6
Comment 7 Build Bot 2017-09-05 19:58:20 PDT
Comment on attachment 319965 [details]
Patch

Attachment 319965 [details] did not pass mac-debug-ews (mac):
Output: http://webkit-queues.webkit.org/results/4458210

New failing tests:
js/string_replace_function.html
js/dom/string-replace-3.html
Comment 8 Build Bot 2017-09-05 19:58:21 PDT
Created attachment 319972 [details]
Archive of layout-test-results from ews112 for mac-elcapitan

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews112  Port: mac-elcapitan  Platform: Mac OS X 10.11.6
Comment 9 Build Bot 2017-09-05 20:02:21 PDT
Comment on attachment 319965 [details]
Patch

Attachment 319965 [details] did not pass mac-ews (mac):
Output: http://webkit-queues.webkit.org/results/4458273

New failing tests:
js/string_replace_function.html
js/dom/string-replace-3.html
Comment 10 Build Bot 2017-09-05 20:02:23 PDT
Created attachment 319973 [details]
Archive of layout-test-results from ews101 for mac-elcapitan

The attached test failures were seen while running run-webkit-tests on the mac-ews.
Bot: ews101  Port: mac-elcapitan  Platform: Mac OS X 10.11.6
Comment 11 Build Bot 2017-09-05 20:16:38 PDT
Comment on attachment 319965 [details]
Patch

Attachment 319965 [details] did not pass ios-sim-ews (ios-simulator-wk2):
Output: http://webkit-queues.webkit.org/results/4458260

New failing tests:
js/string_replace_function.html
js/dom/string-replace-3.html
Comment 12 Build Bot 2017-09-05 20:16:39 PDT
Created attachment 319974 [details]
Archive of layout-test-results from ews125 for ios-simulator-wk2

The attached test failures were seen while running run-webkit-tests on the ios-sim-ews.
Bot: ews125  Port: ios-simulator-wk2  Platform: Mac OS X 10.12.5
Comment 13 JF Bastien 2017-09-05 21:26:16 PDT
Comment on attachment 319965 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=319965&action=review

Not sure I'm the best person to review this, but here's a first pass.

> Source/JavaScriptCore/ChangeLog:12
> +        cases of malformed back references in String.prototype.replace() as I beleive that it

"believe"

> Source/JavaScriptCore/ChangeLog:18
> +        groups.

Bug to do so?

> Source/JavaScriptCore/dfg/DFGStrengthReductionPhase.cpp:517
> +                    dataLog("Giving up because of named capture groups.\n");

Link to bug.

Does it just print out the message when this is reached, but otherwise bails out of the optimization?

> Source/JavaScriptCore/runtime/RegExp.cpp:239
> +        m_namedsGroupToParenIndex.swap(pattern.m_namedsGroupToParenIndex);

"nameds" ?

> Source/JavaScriptCore/runtime/RegExp.h:89
> +        if (!i || m_captureGroupNames.size() <= i)

Why isn't index 0 valid?

> Source/JavaScriptCore/runtime/RegExp.h:99
> +        return m_namedsGroupToParenIndex.get(groupName);

Can't you do *it instead to avoid re-searching?

> Source/JavaScriptCore/runtime/RegExpMatchesArray.cpp:74
> +enum ShouldCreateGroups { DontCreateGroups, CreateGroups };

enum class

> Source/JavaScriptCore/runtime/RegExpMatchesArray.cpp:76
> +static Structure* createStructureImpl(VM& vm, JSGlobalObject* globalObject, IndexingType indexingType, ShouldCreateGroups shouldCreateGroups= DontCreateGroups)

Missing space before equal.

> Source/JavaScriptCore/runtime/RegExpMatchesArray.cpp:87
> +    }

Maybe it's weird here, but I like having a switch that covers both cases of the enum, without a default. That way you get a warning if you forget a case, and as a reader I know without looking up the enum that you've covered all cases. Even here, if DontCreateGroups is just "break;" I feel like it's kinda nice.

> Source/JavaScriptCore/runtime/RegExpMatchesArray.h:106
> +        }

Does this need to be duplicated for the good / bad times sides of the branch? Seems like you don't need the `hasNamedCaptures` bool, and `JSObject* groups` could instead be that "bool" (by being non nullptr if it has named captures). Then it's not duplicating the code too?

> Source/JavaScriptCore/runtime/RegExpMatchesArray.h:149
> +                if (!groupName.isEmpty())

Empty group name is just a regular capture?

> Source/JavaScriptCore/runtime/StringPrototype.cpp:213
> +            // Named back reference

Don't you need a check for `replacement.length() > i + 2` somewhere in here?

> Source/JavaScriptCore/yarr/YarrParser.h:867
> +    int tryConsumeUnicodeEscape()

Is this unicode escape stuff related to named capture?
Comment 14 Michael Saboff 2017-09-06 11:33:18 PDT
Comment on attachment 319965 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=319965&action=review

>> Source/JavaScriptCore/ChangeLog:12
>> +        cases of malformed back references in String.prototype.replace() as I beleive that it
> 
> "believe"

Fixed.

>> Source/JavaScriptCore/ChangeLog:18
>> +        groups.
> 
> Bug to do so?

I filed https://bugs.webkit.org/show_bug.cgi?id=176464.

>> Source/JavaScriptCore/dfg/DFGStrengthReductionPhase.cpp:517
>> +                    dataLog("Giving up because of named capture groups.\n");
> 
> Link to bug.
> 
> Does it just print out the message when this is reached, but otherwise bails out of the optimization?

I filed https://bugs.webkit.org/show_bug.cgi?id=176464.
Yes it logs and bails.  The logging matches existing code.

>> Source/JavaScriptCore/runtime/RegExp.cpp:239
>> +        m_namedsGroupToParenIndex.swap(pattern.m_namedsGroupToParenIndex);
> 
> "nameds" ?

Changed to m_namedGroupToParenIndex.

>> Source/JavaScriptCore/runtime/RegExp.h:89
>> +        if (!i || m_captureGroupNames.size() <= i)
> 
> Why isn't index 0 valid?

Because that is the whole match and not a sub-group.

>> Source/JavaScriptCore/runtime/RegExp.h:99
>> +        return m_namedsGroupToParenIndex.get(groupName);
> 
> Can't you do *it instead to avoid re-searching?

it->value is what I want.

>> Source/JavaScriptCore/runtime/RegExpMatchesArray.cpp:74
>> +enum ShouldCreateGroups { DontCreateGroups, CreateGroups };
> 
> enum class

Fixed.

>> Source/JavaScriptCore/runtime/RegExpMatchesArray.cpp:76
>> +static Structure* createStructureImpl(VM& vm, JSGlobalObject* globalObject, IndexingType indexingType, ShouldCreateGroups shouldCreateGroups= DontCreateGroups)
> 
> Missing space before equal.

Fixed.

>> Source/JavaScriptCore/runtime/RegExpMatchesArray.cpp:87
>> +    }
> 
> Maybe it's weird here, but I like having a switch that covers both cases of the enum, without a default. That way you get a warning if you forget a case, and as a reader I know without looking up the enum that you've covered all cases. Even here, if DontCreateGroups is just "break;" I feel like it's kinda nice.

I originally followed existing similar patterns, but I made the change you suggested.

>> Source/JavaScriptCore/runtime/RegExpMatchesArray.h:106
>> +        }
> 
> Does this need to be duplicated for the good / bad times sides of the branch? Seems like you don't need the `hasNamedCaptures` bool, and `JSObject* groups` could instead be that "bool" (by being non nullptr if it has named captures). Then it's not duplicating the code too?

I restructured things some and put the creation of the "groups" object inside of the setProperties lambda.  That reduced some of the duplication.  I kept hasNamedCaptures as it is used to select the structure for the array as well as whether or not "groups" gets created.

>> Source/JavaScriptCore/runtime/RegExpMatchesArray.h:149
>> +                if (!groupName.isEmpty())
> 
> Empty group name is just a regular capture?

Yes.  If a capture group name was not provided, then groupName will be empty.

>> Source/JavaScriptCore/runtime/StringPrototype.cpp:213
>> +            // Named back reference
> 
> Don't you need a check for `replacement.length() > i + 2` somewhere in here?

No, because it was checked up above at line 188 with "if (i + 1 == replacement.length()) break;".

>> Source/JavaScriptCore/yarr/YarrParser.h:867
>> +    int tryConsumeUnicodeEscape()
> 
> Is this unicode escape stuff related to named capture?

Yes, because the identifiers can contain unicode escapes.
Comment 15 Michael Saboff 2017-09-06 15:24:33 PDT
Created attachment 320067 [details]
Updated patch addressing review comments and test failures.
Comment 16 Build Bot 2017-09-06 15:26:16 PDT
Attachment 320067 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/yarr/YarrSyntaxChecker.cpp:47:  Missing space inside { }.  [whitespace/braces] [5]
ERROR: Source/JavaScriptCore/yarr/YarrSyntaxChecker.cpp:51:  Missing space inside { }.  [whitespace/braces] [5]
ERROR: Source/JavaScriptCore/yarr/YarrPattern.cpp:528:  Should have only a single space after a punctuation in a comment.  [whitespace/comments] [5]
ERROR: Source/JavaScriptCore/yarr/YarrPattern.cpp:957:  One space before end of line comments  [whitespace/comments] [5]
ERROR: Source/JavaScriptCore/yarr/YarrPattern.cpp:958:  One space before end of line comments  [whitespace/comments] [5]
Total errors found: 5 in 21 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 17 JF Bastien 2017-09-06 15:32:46 PDT
Comment on attachment 320067 [details]
Updated patch addressing review comments and test failures.

View in context: https://bugs.webkit.org/attachment.cgi?id=320067&action=review

r=me though it would be good if someone else took a look too since I'm not familiar with the code.

> Source/JavaScriptCore/runtime/RegExpMatchesArray.cpp:74
> +enum class ShouldCreateGroups { DontCreateGroups, CreateGroups };

With enum class I wouldn't repeat "Group" in the enumerated names. Maybe even just ShouldCreateGroup::{No,Yes} ?
Comment 18 Michael Saboff 2017-09-06 15:40:21 PDT
(In reply to JF Bastien from comment #17)
> Comment on attachment 320067 [details]
> Updated patch addressing review comments and test failures.
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=320067&action=review
> 
> r=me though it would be good if someone else took a look too since I'm not
> familiar with the code.

I'll set review back to ? to get someone else to look.

> > Source/JavaScriptCore/runtime/RegExpMatchesArray.cpp:74
> > +enum class ShouldCreateGroups { DontCreateGroups, CreateGroups };
> 
> With enum class I wouldn't repeat "Group" in the enumerated names. Maybe
> even just ShouldCreateGroup::{No,Yes} ?

I changed it to No / Yes.
Comment 19 Filip Pizlo 2017-09-07 12:54:52 PDT
Comment on attachment 320067 [details]
Updated patch addressing review comments and test failures.

View in context: https://bugs.webkit.org/attachment.cgi?id=320067&action=review

> Source/JavaScriptCore/runtime/RegExpMatchesArray.h:107
> -        
> +

Revert

> Source/JavaScriptCore/runtime/RegExpMatchesArray.h:130
> -        
> +

Revert
Comment 20 Michael Saboff 2017-09-07 16:13:40 PDT
Committed r221769: <http://trac.webkit.org/changeset/221769>
Comment 21 Darin Adler 2017-09-10 14:32:11 PDT
Comment on attachment 320067 [details]
Updated patch addressing review comments and test failures.

View in context: https://bugs.webkit.org/attachment.cgi?id=320067&action=review

> Source/JavaScriptCore/yarr/YarrParser.h:203
> +        NO_RETURN_DUE_TO_ASSERT void atomNamedBackReference(String) { RELEASE_ASSERT_NOT_REACHED(); }

This is unconventional. Since String is just an object that holds a RefPtr, we normally pass "const String&" or "String&&", and very rarely pass "String".

> Source/JavaScriptCore/yarr/YarrSyntaxChecker.cpp:51
> +    void atomNamedBackReference(String) {}

This is unconventional. Since String is just an object that holds a RefPtr, we normally pass "const String&" or "String&&", and very rarely pass "String".