Bug 195514 - Invalid flags in a RegExp literal should be an early SyntaxError
Summary: Invalid flags in a RegExp literal should be an early SyntaxError
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Ross Kirsling
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2019-03-08 23:41 PST by Ross Kirsling
Modified: 2019-03-10 23:21 PDT (History)
10 users (show)

See Also:


Attachments
Patch (8.34 KB, patch)
2019-03-08 23:54 PST, Ross Kirsling
no flags Details | Formatted Diff | Diff
Patch with YarrFlags refactor (45.71 KB, patch)
2019-03-09 16:29 PST, Ross Kirsling
no flags Details | Formatted Diff | Diff
Patch with YarrFlags refactor (45.61 KB, patch)
2019-03-09 16:34 PST, Ross Kirsling
no flags Details | Formatted Diff | Diff
Patch with YarrFlags refactor (45.63 KB, patch)
2019-03-09 16:43 PST, Ross Kirsling
no flags Details | Formatted Diff | Diff
Patch with YarrFlags refactor (45.67 KB, patch)
2019-03-09 17:51 PST, Ross Kirsling
no flags Details | Formatted Diff | Diff
Patch with YarrFlags refactor (45.67 KB, patch)
2019-03-09 18:09 PST, Ross Kirsling
no flags Details | Formatted Diff | Diff
Archive of layout-test-results from ews105 for mac-highsierra-wk2 (2.54 MB, application/zip)
2019-03-09 19:35 PST, EWS Watchlist
no flags Details
Patch for landing (47.50 KB, patch)
2019-03-10 22:01 PDT, Ross Kirsling
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ross Kirsling 2019-03-08 23:41:11 PST
Invalid flags in a RegExp literal should be an early SyntaxError
Comment 1 Ross Kirsling 2019-03-08 23:54:51 PST
Created attachment 364121 [details]
Patch
Comment 2 Ross Kirsling 2019-03-09 00:11:46 PST
Note: When the pattern and flags are both invalid (e.g. /+/gg), the spec doesn't care which one the error message points out. V8 and SM both prioritize the flags; I've chosen to do similarly here (since it saves us from that extra .contains('u') call).
Comment 3 Ross Kirsling 2019-03-09 00:30:32 PST
Comment on attachment 364121 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=364121&action=review

> Source/JavaScriptCore/yarr/YarrSyntaxChecker.cpp:59
> +static RegExpFlags parseFlags(StringView string)

I do feel a bit weird about having basically copy-pasted this function from runtime/RegExp.cpp, but I couldn't decide on an appropriate common location -- perhaps runtime/RegExpKey.h would work, given that that's where RegExpFlags is defined...?
Comment 4 Ross Kirsling 2019-03-09 13:35:05 PST
(In reply to Ross Kirsling from comment #3)
> Comment on attachment 364121 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=364121&action=review
> 
> > Source/JavaScriptCore/yarr/YarrSyntaxChecker.cpp:59
> > +static RegExpFlags parseFlags(StringView string)
> 
> I do feel a bit weird about having basically copy-pasted this function from
> runtime/RegExp.cpp, but I couldn't decide on an appropriate common location
> -- perhaps runtime/RegExpKey.h would work, given that that's where
> RegExpFlags is defined...?

Perhaps better still would be to move both the enum and function to a new yarr/YarrFlags.h? YARR doesn't actually care about the RegExpKey struct, so it's already weird that it needs to reach outside to get to RegExpFlags...
Comment 5 Ross Kirsling 2019-03-09 16:29:35 PST Comment hidden (obsolete)
Comment 6 EWS Watchlist 2019-03-09 16:31:30 PST Comment hidden (obsolete)
Comment 7 Ross Kirsling 2019-03-09 16:34:25 PST Comment hidden (obsolete)
Comment 8 Ross Kirsling 2019-03-09 16:43:46 PST Comment hidden (obsolete)
Comment 9 Ross Kirsling 2019-03-09 17:51:22 PST Comment hidden (obsolete)
Comment 10 Ross Kirsling 2019-03-09 18:09:07 PST
Created attachment 364151 [details]
Patch with YarrFlags refactor
Comment 11 EWS Watchlist 2019-03-09 19:35:07 PST Comment hidden (obsolete)
Comment 12 EWS Watchlist 2019-03-09 19:35:08 PST Comment hidden (obsolete)
Comment 13 Ross Kirsling 2019-03-09 20:50:14 PST
Comment on attachment 364151 [details]
Patch with YarrFlags refactor

View in context: https://bugs.webkit.org/attachment.cgi?id=364151&action=review

> Source/JavaScriptCore/bytecompiler/NodesCodegen.cpp:147
>      if (regExp->isValid())
>          return generator.emitNewRegExp(generator.finalDestination(dst), regExp);

Note:
We could almost assert isValid here, if not for invalid references in Unicode patterns like /\1/u and /\k<a>\u.
These too are supposed* to be early errors though, so we can handle this in a subsequent patch.

* https://test262.report/browse/language/literals/regexp/u-dec-esc.js
  https://test262.report/browse/language/literals/regexp/named-groups/invalid-dangling-groupname-without-group-u.js
Comment 14 Darin Adler 2019-03-10 13:34:22 PDT
Comment on attachment 364151 [details]
Patch with YarrFlags refactor

View in context: https://bugs.webkit.org/attachment.cgi?id=364151&action=review

Looks like a really good change.

I’m not sure the includes here are minimal. For example, YarrFlags.h includes OptionSet.h but there are many cases where we are adding both. Similarly, anywhere we add StringHash.h we can remove other includes like WTFString.h.

>> Source/JavaScriptCore/bytecompiler/NodesCodegen.cpp:147
>>          return generator.emitNewRegExp(generator.finalDestination(dst), regExp);
> 
> Note:
> We could almost assert isValid here, if not for invalid references in Unicode patterns like /\1/u and /\k<a>\u.
> These too are supposed* to be early errors though, so we can handle this in a subsequent patch.
> 
> * https://test262.report/browse/language/literals/regexp/u-dec-esc.js
>   https://test262.report/browse/language/literals/regexp/named-groups/invalid-dangling-groupname-without-group-u.js

Perhaps we can do *more* than just assert isValid in a subsequent patch. I suspect we could get rid of the concept of an invalid RegExp entirely and remove the RegExp::invalid function as well.

> Source/JavaScriptCore/runtime/CachedTypes.cpp:42
> +#include "YarrFlags.h"

I think we could use a forward declaration here instead of including the header. Not 100% sure. Maybe not worth trying to optimize that.

> Source/JavaScriptCore/runtime/CachedTypes.cpp:44
>  #include <wtf/Forward.h>

I doubt this is needed. I am sure that one of the other headers includes it.

> Source/JavaScriptCore/runtime/RegExpCache.h:34
> +#include "YarrFlags.h"

Can this be done with a forward declaration instead of an include? I believe that in the past we found that we can just forward declare enumerations to compile things like this.

> Source/JavaScriptCore/runtime/RegExpConstructor.cpp:212
> +    EXCEPTION_ASSERT(!!scope.exception() == (flags == Yarr::Flags::Invalid));
> +    if (UNLIKELY(flags == Yarr::Flags::Invalid))
>          return nullptr;

Seems like we should remove the unnecessary EXCEPTION_ASSERT trick here and just do RETURN_IF_EXCEPTION here. I don’t think it’s significantly better for performance and it requires us to preserve a special Invalid value which I don’t think we want unless it is critical for performance.

> Source/JavaScriptCore/runtime/RegExpConstructor.cpp:254
> +            EXCEPTION_ASSERT(!!scope.exception() == (flags == Yarr::Flags::Invalid));
> +            if (flags == Yarr::Flags::Invalid)
>                  return nullptr;

Ditto.

> Source/JavaScriptCore/runtime/RegExpPrototype.cpp:160
> -        RegExpFlags flags = NoFlags;
> +        OptionSet<Yarr::Flags> flags;
>          if (!arg1.isUndefined()) {
> -            flags = regExpFlags(arg1.toWTFString(exec));
> +            flags = Yarr::parseFlags(arg1.toWTFString(exec));
>              RETURN_IF_EXCEPTION(scope, encodedJSValue());
> -            if (flags == InvalidFlags)
> +            if (flags == Yarr::Flags::Invalid)
>                  return throwVMError(exec, scope, createSyntaxError(exec, "Invalid flags supplied to RegExp constructor."_s));
>          }

Seems like this should be sharing more code with the toFlags function in RegExpConstructor.cpp. This does a lot of the same work in slightly different ways, which seems unnecessary.

> Source/JavaScriptCore/testRegExp.cpp:332
> -    RegExp* r = RegExp::create(vm, pattern.toString(), regExpFlags(line + i));
> +    RegExp* r = RegExp::create(vm, pattern.toString(), Yarr::parseFlags(line + i));

Seems like we should handle the invalid flags here, rather than passing invalid flags into RegExp::create.

> Source/JavaScriptCore/yarr/YarrFlags.h:29
> +#include <wtf/OptionSet.h>
> +#include <wtf/text/StringView.h>

We don’t need either of these includes to compile this header. Forward.h would be sufficient.

> Source/JavaScriptCore/yarr/YarrFlags.h:40
> +    Invalid = 1 << 6,

Do we really need to keep Flags::Invalid? In most other similar cases, we just use Optional in places where we are parsing. Using Optional is nice because it makes certain mistakes fail to compile rather than doing something unexpected at runtime. For example, someone could parse and look for one of the flags and forget to check for Invalid. But you can’t compile code that makes that mistake with Optional (or Expected).

> Source/JavaScriptCore/yarr/YarrFlags.h:41
> +    DeletedValue = 1 << 7

On the other hand, I suppose we do need DeletedValue for OptionSet<Flags> so we can use it as a hash table key and I don’t object to keeping it even though it’s inelegant and worth looking for a more elegant yet efficient way to do it in the long run.

> Source/JavaScriptCore/yarr/YarrFlags.h:44
> +JS_EXPORT_PRIVATE OptionSet<Flags> parseFlags(StringView);

I suggest having this return Optional<OptionSet<Flags>> instead of using a special invalid value.

> Source/JavaScriptCore/yarr/YarrPattern.cpp:1110
> +    ASSERT(m_flags != Flags::Invalid);

This assertion is only needed because of the poor pattern of having an Invalid value that we should eliminate. Worth noting that a deleted value here is equally bad and we should consider asserting about that, too.

Also, I’m thinking maybe we should put the assertions in the constructor instead of here?
Comment 15 Ross Kirsling 2019-03-10 22:01:34 PDT
Created attachment 364232 [details]
Patch for landing
Comment 16 Ross Kirsling 2019-03-10 22:04:07 PDT
Comment on attachment 364232 [details]
Patch for landing

Patch updated to address all provided feedback. In particular, Flags::Invalid is now gone and the RegExp class is now free to assume flag validity.

Future patches:

- Early-error-ify bad references in Unicode patterns and eliminate RegExp::isValid altogether.

- Add Optional to Forward.h. (Right? Seems like it should already be there. I avoided doing it here since I figure we can probably move a whole bunch of includes to implementation files.)
Comment 17 WebKit Commit Bot 2019-03-10 23:20:59 PDT
Comment on attachment 364232 [details]
Patch for landing

Clearing flags on attachment: 364232

Committed r242699: <https://trac.webkit.org/changeset/242699>
Comment 18 WebKit Commit Bot 2019-03-10 23:21:01 PDT
All reviewed patches have been landed.  Closing bug.
Comment 19 Radar WebKit Bug Importer 2019-03-10 23:21:21 PDT
<rdar://problem/48759006>