RESOLVED FIXED 176435
Add support for RegExp named capture groups
https://bugs.webkit.org/show_bug.cgi?id=176435
Summary Add support for RegExp named capture groups
Michael Saboff
Reported 2017-09-05 18:10:10 PDT
This is to add support for the ECMAScript proposed Regular Expression named capture groups. See https://github.com/tc39/proposal-regexp-named-groups for details. For example, the RegExp /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/ when matched would create a match object result with properties group.year, group.month and group.day in addition to the array like capture results.
Attachments
Patch (53.04 KB, patch)
2017-09-05 18:24 PDT, Michael Saboff
buildbot: commit-queue-
Archive of layout-test-results from ews107 for mac-elcapitan-wk2 (1.58 MB, application/zip)
2017-09-05 19:55 PDT, Build Bot
no flags
Archive of layout-test-results from ews112 for mac-elcapitan (2.38 MB, application/zip)
2017-09-05 19:58 PDT, Build Bot
no flags
Archive of layout-test-results from ews101 for mac-elcapitan (1.55 MB, application/zip)
2017-09-05 20:02 PDT, Build Bot
no flags
Archive of layout-test-results from ews125 for ios-simulator-wk2 (1.41 MB, application/zip)
2017-09-05 20:16 PDT, Build Bot
no flags
Updated patch addressing review comments and test failures. (53.99 KB, patch)
2017-09-06 15:24 PDT, Michael Saboff
fpizlo: review+
Michael Saboff
Comment 1 2017-09-05 18:10:34 PDT
Michael Saboff
Comment 2 2017-09-05 18:24:37 PDT
Created attachment 319965 [details] Patch The style checker errors are due to the checker looking at code that didn't change or new code following existing formatting.
Build Bot
Comment 3 2017-09-05 18:26:34 PDT
Attachment 319965 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/yarr/YarrSyntaxChecker.cpp:47: Missing space inside { }. [whitespace/braces] [5] ERROR: Source/JavaScriptCore/yarr/YarrSyntaxChecker.cpp:51: Missing space inside { }. [whitespace/braces] [5] ERROR: Source/JavaScriptCore/yarr/YarrPattern.cpp:528: Should have only a single space after a punctuation in a comment. [whitespace/comments] [5] ERROR: Source/JavaScriptCore/yarr/YarrPattern.cpp:957: One space before end of line comments [whitespace/comments] [5] ERROR: Source/JavaScriptCore/yarr/YarrPattern.cpp:958: One space before end of line comments [whitespace/comments] [5] Total errors found: 5 in 21 files If any of these errors are false positives, please file a bug against check-webkit-style.
Build Bot
Comment 4 2017-09-05 19:14:28 PDT
Comment on attachment 319965 [details] Patch Attachment 319965 [details] did not pass jsc-ews (mac): Output: http://webkit-queues.webkit.org/results/4458091 New failing tests: jsc-layout-tests.yaml/js/script-tests/string_replace_function.js.layout-ftl-no-cjit mozilla-tests.yaml/ecma_2/RegExp/hex-001.js.mozilla-ftl-eager-no-cjit-validate-phases mozilla-tests.yaml/ecma_2/RegExp/unicode-001.js.mozilla-ftl-eager-no-cjit-validate-phases jsc-layout-tests.yaml/js/script-tests/regexp-in-and-foreach-handling.js.layout-ftl-eager-no-cjit jsc-layout-tests.yaml/js/script-tests/string_replace_function.js.layout-no-llint mozilla-tests.yaml/ecma_2/RegExp/hex-001.js.mozilla-dfg-eager-no-cjit-validate-phases jsc-layout-tests.yaml/js/script-tests/string_replace_function.js.layout-ftl-eager-no-cjit jsc-layout-tests.yaml/js/script-tests/regexp-in-and-foreach-handling.js.layout-dfg-eager-no-cjit jsc-layout-tests.yaml/js/script-tests/string_replace_function.js.layout-dfg-eager-no-cjit microbenchmarks/emscripten-cube2hash.js.ftl-eager-no-cjit-b3o1 mozilla-tests.yaml/ecma_2/RegExp/unicode-001.js.mozilla-dfg-eager-no-cjit-validate-phases jsc-layout-tests.yaml/js/script-tests/string_replace_function.js.layout stress/regexp-replace-proxy.js.ftl-eager-no-cjit mozilla-tests.yaml/ecma_2/RegExp/exec-002.js.mozilla-ftl-eager-no-cjit-validate-phases jsc-layout-tests.yaml/js/script-tests/string_replace_function.js.layout-no-cjit jsc-layout-tests.yaml/js/script-tests/string_replace_function.js.layout-no-ftl
Build Bot
Comment 5 2017-09-05 19:55:26 PDT
Comment on attachment 319965 [details] Patch Attachment 319965 [details] did not pass mac-wk2-ews (mac-wk2): Output: http://webkit-queues.webkit.org/results/4458230 New failing tests: js/string_replace_function.html js/dom/string-replace-3.html
Build Bot
Comment 6 2017-09-05 19:55:27 PDT
Created attachment 319971 [details] Archive of layout-test-results from ews107 for mac-elcapitan-wk2 The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews. Bot: ews107 Port: mac-elcapitan-wk2 Platform: Mac OS X 10.11.6
Build Bot
Comment 7 2017-09-05 19:58:20 PDT
Comment on attachment 319965 [details] Patch Attachment 319965 [details] did not pass mac-debug-ews (mac): Output: http://webkit-queues.webkit.org/results/4458210 New failing tests: js/string_replace_function.html js/dom/string-replace-3.html
Build Bot
Comment 8 2017-09-05 19:58:21 PDT
Created attachment 319972 [details] Archive of layout-test-results from ews112 for mac-elcapitan The attached test failures were seen while running run-webkit-tests on the mac-debug-ews. Bot: ews112 Port: mac-elcapitan Platform: Mac OS X 10.11.6
Build Bot
Comment 9 2017-09-05 20:02:21 PDT
Comment on attachment 319965 [details] Patch Attachment 319965 [details] did not pass mac-ews (mac): Output: http://webkit-queues.webkit.org/results/4458273 New failing tests: js/string_replace_function.html js/dom/string-replace-3.html
Build Bot
Comment 10 2017-09-05 20:02:23 PDT
Created attachment 319973 [details] Archive of layout-test-results from ews101 for mac-elcapitan The attached test failures were seen while running run-webkit-tests on the mac-ews. Bot: ews101 Port: mac-elcapitan Platform: Mac OS X 10.11.6
Build Bot
Comment 11 2017-09-05 20:16:38 PDT
Comment on attachment 319965 [details] Patch Attachment 319965 [details] did not pass ios-sim-ews (ios-simulator-wk2): Output: http://webkit-queues.webkit.org/results/4458260 New failing tests: js/string_replace_function.html js/dom/string-replace-3.html
Build Bot
Comment 12 2017-09-05 20:16:39 PDT
Created attachment 319974 [details] Archive of layout-test-results from ews125 for ios-simulator-wk2 The attached test failures were seen while running run-webkit-tests on the ios-sim-ews. Bot: ews125 Port: ios-simulator-wk2 Platform: Mac OS X 10.12.5
JF Bastien
Comment 13 2017-09-05 21:26:16 PDT
Comment on attachment 319965 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=319965&action=review Not sure I'm the best person to review this, but here's a first pass. > Source/JavaScriptCore/ChangeLog:12 > + cases of malformed back references in String.prototype.replace() as I beleive that it "believe" > Source/JavaScriptCore/ChangeLog:18 > + groups. Bug to do so? > Source/JavaScriptCore/dfg/DFGStrengthReductionPhase.cpp:517 > + dataLog("Giving up because of named capture groups.\n"); Link to bug. Does it just print out the message when this is reached, but otherwise bails out of the optimization? > Source/JavaScriptCore/runtime/RegExp.cpp:239 > + m_namedsGroupToParenIndex.swap(pattern.m_namedsGroupToParenIndex); "nameds" ? > Source/JavaScriptCore/runtime/RegExp.h:89 > + if (!i || m_captureGroupNames.size() <= i) Why isn't index 0 valid? > Source/JavaScriptCore/runtime/RegExp.h:99 > + return m_namedsGroupToParenIndex.get(groupName); Can't you do *it instead to avoid re-searching? > Source/JavaScriptCore/runtime/RegExpMatchesArray.cpp:74 > +enum ShouldCreateGroups { DontCreateGroups, CreateGroups }; enum class > Source/JavaScriptCore/runtime/RegExpMatchesArray.cpp:76 > +static Structure* createStructureImpl(VM& vm, JSGlobalObject* globalObject, IndexingType indexingType, ShouldCreateGroups shouldCreateGroups= DontCreateGroups) Missing space before equal. > Source/JavaScriptCore/runtime/RegExpMatchesArray.cpp:87 > + } Maybe it's weird here, but I like having a switch that covers both cases of the enum, without a default. That way you get a warning if you forget a case, and as a reader I know without looking up the enum that you've covered all cases. Even here, if DontCreateGroups is just "break;" I feel like it's kinda nice. > Source/JavaScriptCore/runtime/RegExpMatchesArray.h:106 > + } Does this need to be duplicated for the good / bad times sides of the branch? Seems like you don't need the `hasNamedCaptures` bool, and `JSObject* groups` could instead be that "bool" (by being non nullptr if it has named captures). Then it's not duplicating the code too? > Source/JavaScriptCore/runtime/RegExpMatchesArray.h:149 > + if (!groupName.isEmpty()) Empty group name is just a regular capture? > Source/JavaScriptCore/runtime/StringPrototype.cpp:213 > + // Named back reference Don't you need a check for `replacement.length() > i + 2` somewhere in here? > Source/JavaScriptCore/yarr/YarrParser.h:867 > + int tryConsumeUnicodeEscape() Is this unicode escape stuff related to named capture?
Michael Saboff
Comment 14 2017-09-06 11:33:18 PDT
Comment on attachment 319965 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=319965&action=review >> Source/JavaScriptCore/ChangeLog:12 >> + cases of malformed back references in String.prototype.replace() as I beleive that it > > "believe" Fixed. >> Source/JavaScriptCore/ChangeLog:18 >> + groups. > > Bug to do so? I filed https://bugs.webkit.org/show_bug.cgi?id=176464. >> Source/JavaScriptCore/dfg/DFGStrengthReductionPhase.cpp:517 >> + dataLog("Giving up because of named capture groups.\n"); > > Link to bug. > > Does it just print out the message when this is reached, but otherwise bails out of the optimization? I filed https://bugs.webkit.org/show_bug.cgi?id=176464. Yes it logs and bails. The logging matches existing code. >> Source/JavaScriptCore/runtime/RegExp.cpp:239 >> + m_namedsGroupToParenIndex.swap(pattern.m_namedsGroupToParenIndex); > > "nameds" ? Changed to m_namedGroupToParenIndex. >> Source/JavaScriptCore/runtime/RegExp.h:89 >> + if (!i || m_captureGroupNames.size() <= i) > > Why isn't index 0 valid? Because that is the whole match and not a sub-group. >> Source/JavaScriptCore/runtime/RegExp.h:99 >> + return m_namedsGroupToParenIndex.get(groupName); > > Can't you do *it instead to avoid re-searching? it->value is what I want. >> Source/JavaScriptCore/runtime/RegExpMatchesArray.cpp:74 >> +enum ShouldCreateGroups { DontCreateGroups, CreateGroups }; > > enum class Fixed. >> Source/JavaScriptCore/runtime/RegExpMatchesArray.cpp:76 >> +static Structure* createStructureImpl(VM& vm, JSGlobalObject* globalObject, IndexingType indexingType, ShouldCreateGroups shouldCreateGroups= DontCreateGroups) > > Missing space before equal. Fixed. >> Source/JavaScriptCore/runtime/RegExpMatchesArray.cpp:87 >> + } > > Maybe it's weird here, but I like having a switch that covers both cases of the enum, without a default. That way you get a warning if you forget a case, and as a reader I know without looking up the enum that you've covered all cases. Even here, if DontCreateGroups is just "break;" I feel like it's kinda nice. I originally followed existing similar patterns, but I made the change you suggested. >> Source/JavaScriptCore/runtime/RegExpMatchesArray.h:106 >> + } > > Does this need to be duplicated for the good / bad times sides of the branch? Seems like you don't need the `hasNamedCaptures` bool, and `JSObject* groups` could instead be that "bool" (by being non nullptr if it has named captures). Then it's not duplicating the code too? I restructured things some and put the creation of the "groups" object inside of the setProperties lambda. That reduced some of the duplication. I kept hasNamedCaptures as it is used to select the structure for the array as well as whether or not "groups" gets created. >> Source/JavaScriptCore/runtime/RegExpMatchesArray.h:149 >> + if (!groupName.isEmpty()) > > Empty group name is just a regular capture? Yes. If a capture group name was not provided, then groupName will be empty. >> Source/JavaScriptCore/runtime/StringPrototype.cpp:213 >> + // Named back reference > > Don't you need a check for `replacement.length() > i + 2` somewhere in here? No, because it was checked up above at line 188 with "if (i + 1 == replacement.length()) break;". >> Source/JavaScriptCore/yarr/YarrParser.h:867 >> + int tryConsumeUnicodeEscape() > > Is this unicode escape stuff related to named capture? Yes, because the identifiers can contain unicode escapes.
Michael Saboff
Comment 15 2017-09-06 15:24:33 PDT
Created attachment 320067 [details] Updated patch addressing review comments and test failures.
Build Bot
Comment 16 2017-09-06 15:26:16 PDT
Attachment 320067 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/yarr/YarrSyntaxChecker.cpp:47: Missing space inside { }. [whitespace/braces] [5] ERROR: Source/JavaScriptCore/yarr/YarrSyntaxChecker.cpp:51: Missing space inside { }. [whitespace/braces] [5] ERROR: Source/JavaScriptCore/yarr/YarrPattern.cpp:528: Should have only a single space after a punctuation in a comment. [whitespace/comments] [5] ERROR: Source/JavaScriptCore/yarr/YarrPattern.cpp:957: One space before end of line comments [whitespace/comments] [5] ERROR: Source/JavaScriptCore/yarr/YarrPattern.cpp:958: One space before end of line comments [whitespace/comments] [5] Total errors found: 5 in 21 files If any of these errors are false positives, please file a bug against check-webkit-style.
JF Bastien
Comment 17 2017-09-06 15:32:46 PDT
Comment on attachment 320067 [details] Updated patch addressing review comments and test failures. View in context: https://bugs.webkit.org/attachment.cgi?id=320067&action=review r=me though it would be good if someone else took a look too since I'm not familiar with the code. > Source/JavaScriptCore/runtime/RegExpMatchesArray.cpp:74 > +enum class ShouldCreateGroups { DontCreateGroups, CreateGroups }; With enum class I wouldn't repeat "Group" in the enumerated names. Maybe even just ShouldCreateGroup::{No,Yes} ?
Michael Saboff
Comment 18 2017-09-06 15:40:21 PDT
(In reply to JF Bastien from comment #17) > Comment on attachment 320067 [details] > Updated patch addressing review comments and test failures. > > View in context: > https://bugs.webkit.org/attachment.cgi?id=320067&action=review > > r=me though it would be good if someone else took a look too since I'm not > familiar with the code. I'll set review back to ? to get someone else to look. > > Source/JavaScriptCore/runtime/RegExpMatchesArray.cpp:74 > > +enum class ShouldCreateGroups { DontCreateGroups, CreateGroups }; > > With enum class I wouldn't repeat "Group" in the enumerated names. Maybe > even just ShouldCreateGroup::{No,Yes} ? I changed it to No / Yes.
Filip Pizlo
Comment 19 2017-09-07 12:54:52 PDT
Comment on attachment 320067 [details] Updated patch addressing review comments and test failures. View in context: https://bugs.webkit.org/attachment.cgi?id=320067&action=review > Source/JavaScriptCore/runtime/RegExpMatchesArray.h:107 > - > + Revert > Source/JavaScriptCore/runtime/RegExpMatchesArray.h:130 > - > + Revert
Michael Saboff
Comment 20 2017-09-07 16:13:40 PDT
Darin Adler
Comment 21 2017-09-10 14:32:11 PDT
Comment on attachment 320067 [details] Updated patch addressing review comments and test failures. View in context: https://bugs.webkit.org/attachment.cgi?id=320067&action=review > Source/JavaScriptCore/yarr/YarrParser.h:203 > + NO_RETURN_DUE_TO_ASSERT void atomNamedBackReference(String) { RELEASE_ASSERT_NOT_REACHED(); } This is unconventional. Since String is just an object that holds a RefPtr, we normally pass "const String&" or "String&&", and very rarely pass "String". > Source/JavaScriptCore/yarr/YarrSyntaxChecker.cpp:51 > + void atomNamedBackReference(String) {} This is unconventional. Since String is just an object that holds a RefPtr, we normally pass "const String&" or "String&&", and very rarely pass "String".
Note You need to log in before you can comment on or make changes to this bug.