Bug 189407 - Test262 failure with Named Capture Groups - using a reference before the group is defined
Summary: Test262 failure with Named Capture Groups - using a reference before the grou...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Michael Saboff
URL:
Keywords: InRadar
Depends on:
Blocks: 178175
  Show dependency treegraph
 
Reported: 2018-09-07 09:03 PDT by Michael Saboff
Modified: 2020-03-24 13:08 PDT (History)
7 users (show)

See Also:


Attachments
Patch (21.30 KB, patch)
2018-09-07 14:03 PDT, Michael Saboff
no flags Details | Formatted Diff | Diff
Patch with suggested updates (22.29 KB, patch)
2018-09-07 15:55 PDT, Michael Saboff
achristensen: review-
Details | Formatted Diff | Diff
Patch with updated ContentExtensionTest.ParsingFailures tests (24.76 KB, patch)
2018-09-10 16:47 PDT, Michael Saboff
achristensen: review+
commit-queue: commit-queue-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Saboff 2018-09-07 09:03:50 PDT
This Test262 test is failing - JSTests/test262/test/built-ins/RegExp/named-groups/unicode-references.js
Exception: SyntaxError: Invalid regular expression: invalid backreference for unicode pattern
at JSTests/test262/test/built-ins/RegExp/named-groups/unicode-references.js:33

The line in question is:
  assert(compareArray(["bab", "b"], "bab".match(/\k<a>(?<a>b)\w\k<a>/u)));

Here the first reference of the named group “a” is being used before the referenced is defined.  This is allowed by the spec and should match the empty string.
Comment 1 Michael Saboff 2018-09-07 09:04:16 PDT
<rdar://problem/34844068>
Comment 2 Michael Saboff 2018-09-07 14:03:35 PDT
Created attachment 349187 [details]
Patch
Comment 3 EWS Watchlist 2018-09-07 14:05:25 PDT
Attachment 349187 [details] did not pass style-queue:


ERROR: JSTests/ChangeLog:11:  Need whitespace between colon and description  [changelog/filechangedescriptionwhitespace] [5]
ERROR: Source/JavaScriptCore/yarr/YarrSyntaxChecker.cpp:53:  Missing space inside { }.  [whitespace/braces] [5]
Total errors found: 2 in 13 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 4 Saam Barati 2018-09-07 14:46:34 PDT
Comment on attachment 349187 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=349187&action=review

> Source/JavaScriptCore/yarr/YarrPattern.cpp:464
> +        for (auto it = m_pattern.m_namedForwardReferences.begin(), itEnd = m_pattern.m_namedForwardReferences.end(); it < itEnd; ++it) {

Why not "for ( : )" loop?

> Source/JavaScriptCore/yarr/YarrPattern.cpp:685
> +    bool isValidNamedForwardReference(String subpatternName)

const String&? So we don't copy?

> Source/JavaScriptCore/yarr/YarrPattern.cpp:690
> +        if (!m_invalidNamedForwardReferences.find(subpatternName))
> +            return false;
> +
> +        return true;

this could be:
`return m_invalidNamedForwardReferences.contains(subpatternName)`

I think using "find" is also wrong to check "!" against. We should make sure there is a test that properly catches where this will go wrong.

> Source/JavaScriptCore/yarr/YarrPattern.cpp:693
> +    void atomNamedForwardReference(String subpatternName)

ditto w.r.t copy?

> Source/JavaScriptCore/yarr/YarrPattern.cpp:1168
> +    //  &&&&

please remove

> Source/JavaScriptCore/yarr/YarrPattern.h:399
> +        for (auto it = m_namedForwardReferences.begin(), itEnd = m_namedForwardReferences.end(); it < itEnd; ++it) {

style: Why not make this a "for ( : )" loop?
Comment 5 Michael Saboff 2018-09-07 15:55:01 PDT
Created attachment 349212 [details]
Patch with suggested updates
Comment 6 EWS Watchlist 2018-09-07 15:58:12 PDT
Attachment 349212 [details] did not pass style-queue:


ERROR: JSTests/ChangeLog:11:  Need whitespace between colon and description  [changelog/filechangedescriptionwhitespace] [5]
ERROR: Source/JavaScriptCore/yarr/YarrSyntaxChecker.cpp:51:  Missing space inside { }.  [whitespace/braces] [5]
ERROR: Source/JavaScriptCore/yarr/YarrSyntaxChecker.cpp:53:  Missing space inside { }.  [whitespace/braces] [5]
Total errors found: 3 in 13 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 7 Alex Christensen 2018-09-07 17:04:35 PDT
Comment on attachment 349212 [details]
Patch with suggested updates

View in context: https://bugs.webkit.org/attachment.cgi?id=349212&action=review

> Source/JavaScriptCore/yarr/YarrPattern.cpp:687
> +    bool isValidNamedForwardReference(const String& subpatternName)
> +    {
> +        return !m_unmatchedNamedForwardReferences.contains(subpatternName);

All the other YARR callbacks are informative.  If you moved this check to inside your implementation of atomNamedForwardReference, then YARR would remain a parser with a nice interface.
Said another way, I don't think YARR should ask if it should inform the delegate of something.  It should just inform the delegate and let the delegate decide what to do with it.

> Source/WebCore/contentextensions/URLFilterParser.cpp:144
> +        fail(URLFilterParser::ForwardReference);

Could you please add a test to the ContentExtensionTest.ParsingFailures test to verify this can be hit?  I'm assuming backwards compatibility isn't an issue
Comment 8 Michael Saboff 2018-09-09 15:20:56 PDT
(In reply to Alex Christensen from comment #7)
> Comment on attachment 349212 [details]
> Patch with suggested updates
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=349212&action=review
> 
> > Source/JavaScriptCore/yarr/YarrPattern.cpp:687
> > +    bool isValidNamedForwardReference(const String& subpatternName)
> > +    {
> > +        return !m_unmatchedNamedForwardReferences.contains(subpatternName);
> 
> All the other YARR callbacks are informative.  If you moved this check to
> inside your implementation of atomNamedForwardReference, then YARR would
> remain a parser with a nice interface.
> Said another way, I don't think YARR should ask if it should inform the
> delegate of something.  It should just inform the delegate and let the
> delegate decide what to do with it.

The problem here is that the parser lacks the state information to determine if a RegExp construct is a syntax error or acceptable, and if acceptable how to process it.  We parse the expression once.  If there is some issues with forward references, we parse again using state from the first parsing.  The two parsings are completely independent and don't share state.  That has been the case long before this change.

If there are named forward references that don't have a subsequent named capture group, we shouldn't create a forward reference.  For Unicode patterns, it is a Syntax Error.  The more involved case is for non-unicode patterns where instead of processing as a forward reference, we treat the characters we parsed for the forward references as literal characters to match.  That logic needs to be in the parser as it has to restores parsing state, before creating the necessary character atoms.  Therefore this check cannot be in the implementation of atomNamedForwardReference unless atomNamedForwardReference informs the parser the result of the check.

The other way I considered writing this was to pass the parser an optional set of valid named forward references.  If that optional set is not provided all names are valid.  If there is a set it would contain the valid named groups.  I went with the isValidNamedForwardReference callback method as I believe that it more clearly describes what is specified in the ECMAScript standard.  The bottom line is that state needs to be shared between the parser and the YARR pattern delegate, either through callback results or arguments to the parser.

> > Source/WebCore/contentextensions/URLFilterParser.cpp:144
> > +        fail(URLFilterParser::ForwardReference);
> 
> Could you please add a test to the ContentExtensionTest.ParsingFailures test
> to verify this can be hit?  I'm assuming backwards compatibility isn't an
> issue

Sure.
Comment 9 Alex Christensen 2018-09-10 10:23:34 PDT
Comment on attachment 349212 [details]
Patch with suggested updates

View in context: https://bugs.webkit.org/attachment.cgi?id=349212&action=review

> Source/JavaScriptCore/yarr/YarrParser.h:433
> +                        if (delegate.isValidNamedForwardReference(groupName.value())) {
> +                            delegate.atomNamedForwardReference(groupName.value());

If you replaced these two lines with one line:
delegate.atomMaybeNamedForwardReference(groupName.value());

and in your delegate implementation, check if (!m_unmatchedNamedForwardReferences.contains(subpatternName)) ...
then it would behave identically and I believe it would be a better design.
Comment 10 Michael Saboff 2018-09-10 16:45:33 PDT
(In reply to Alex Christensen from comment #9)
> Comment on attachment 349212 [details]
> Patch with suggested updates
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=349212&action=review
> 
> > Source/JavaScriptCore/yarr/YarrParser.h:433
> > +                        if (delegate.isValidNamedForwardReference(groupName.value())) {
> > +                            delegate.atomNamedForwardReference(groupName.value());
> 
> If you replaced these two lines with one line:
> delegate.atomMaybeNamedForwardReference(groupName.value());
> 
> and in your delegate implementation, check if
> (!m_unmatchedNamedForwardReferences.contains(subpatternName)) ...
> then it would behave identically and I believe it would be a better design.

Alex and I spoke in person and we agreed that the current structure although not as architecturally pure as one would like, it is the best alternative for performance and readability.
Comment 11 Michael Saboff 2018-09-10 16:47:14 PDT
Created attachment 349355 [details]
Patch with updated ContentExtensionTest.ParsingFailures tests
Comment 12 EWS Watchlist 2018-09-10 16:48:53 PDT
Attachment 349355 [details] did not pass style-queue:


ERROR: Source/JavaScriptCore/yarr/YarrSyntaxChecker.cpp:51:  Missing space inside { }.  [whitespace/braces] [5]
ERROR: Source/JavaScriptCore/yarr/YarrSyntaxChecker.cpp:53:  Missing space inside { }.  [whitespace/braces] [5]
Total errors found: 2 in 15 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 13 WebKit Commit Bot 2018-09-10 21:37:40 PDT
Comment on attachment 349355 [details]
Patch with updated ContentExtensionTest.ParsingFailures tests

Rejecting attachment 349355 [details] from commit-queue.

Failed to run "['/Volumes/Data/EWS/WebKit/Tools/Scripts/webkit-patch', '--status-host=webkit-queues.webkit.org', '--bot-id=webkit-cq-01', 'validate-changelog', '--check-oops', '--non-interactive', 349355, '--port=mac']" exit_code: 1 cwd: /Volumes/Data/EWS/WebKit

Traceback (most recent call last):
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkit-patch", line 84, in <module>
    main()
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkit-patch", line 79, in main
    WebKitPatch(os.path.abspath(__file__)).main()
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/tool/multicommandtool.py", line 305, in main
    result = command.check_arguments_and_execute(options, args, self)
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/tool/multicommandtool.py", line 123, in check_arguments_and_execute
    return self.execute(options, args, tool) or 0
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/tool/commands/abstractsequencedcommand.py", line 55, in execute
    self._sequence.run_and_handle_errors(tool, options, state)
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/tool/commands/stepsequence.py", line 73, in run_and_handle_errors
    self._run(tool, options, state)
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/tool/commands/stepsequence.py", line 67, in _run
    step(tool, options).run(state)
  File "/Volumes/Data/EWS/WebKit/Tools/Scripts/webkitpy/tool/steps/validatereviewer.py", line 54, in run
    if changelog_entry.has_valid_reviewer():
AttributeError: 'NoneType' object has no attribute 'has_valid_reviewer'

Full output: https://webkit-queues.webkit.org/results/9169100
Comment 14 Michael Saboff 2018-09-10 21:53:58 PDT
Committed r235882: <https://trac.webkit.org/changeset/235882>