Bug 71572 - String new RegExp('\n').toString() returns is invalid RegularExpressionLiteral
Summary: String new RegExp('\n').toString() returns is invalid RegularExpressionLiteral
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-11-04 10:55 PDT by Yusuke Suzuki
Modified: 2011-11-16 10:44 PST (History)
4 users (show)

See Also:


Attachments
Patch (6.87 KB, patch)
2011-11-04 11:04 PDT, Yusuke Suzuki
no flags Details | Formatted Diff | Diff
Patch (7.87 KB, patch)
2011-11-15 13:33 PST, Yusuke Suzuki
no flags Details | Formatted Diff | Diff
Patch (8.11 KB, patch)
2011-11-15 14:24 PST, Yusuke Suzuki
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Yusuke Suzuki 2011-11-04 10:55:38 PDT
new RegExp('\n').toString() returns 
/
/
, so this result is including LineTerminator.

And, according to ECMA262 section 15.10.6.4
"The returned String has the form of a RegularExpressionLiteral that evaluates to another RegExp object with the same behaviour as this object."
http://es5.github.com/#x15.10.6.4

And, because of RegularExpressionNonTerminator, LineTerminator is not allowed in RegularExpressionLiteral. So RegExp.source should be escaped.

Furthermore, new RegExp('\\\n').toString() should provide /\n/, because /\\\n/ doesn't have the same behaviour to new RegExp('\\\n').
Comment 1 Yusuke Suzuki 2011-11-04 11:04:32 PDT
Created attachment 113682 [details]
Patch

attached patch
Comment 2 Yusuke Suzuki 2011-11-05 04:47:49 PDT
reported issue and sent patch to V8
http://code.google.com/p/v8/issues/detail?id=1813
Comment 3 Gavin Barraclough 2011-11-13 14:09:36 PST
Comment on attachment 113682 [details]
Patch

Good catch & nice fix. But I think we may want to make one tweak to this patch before landing.  Most regular expressions don't contain forwards slashes or newlines, but this patch will always produce a fresh copy of the string every time source is called. Currently the code has an early return for strings that don't contain a forwards slash. We could expand this to also check for newline characters, and early return if the string contains neither forward slashes or newlines. r- for the loss of this optimization, but r+ in principle, we should definitely take this fix & the code looks great otherwise. :-)
Comment 4 Yusuke Suzuki 2011-11-15 12:18:16 PST
(In reply to comment #3)
> (From update of attachment 113682 [details])
> Good catch & nice fix. But I think we may want to make one tweak to this patch before landing.  Most regular expressions don't contain forwards slashes or newlines, but this patch will always produce a fresh copy of the string every time source is called. Currently the code has an early return for strings that don't contain a forwards slash. We could expand this to also check for newline characters, and early return if the string contains neither forward slashes or newlines. r- for the loss of this optimization, but r+ in principle, we should definitely take this fix & the code looks great otherwise. :-)

Thanks for your review. You're right. I'll add early return to this patch.
Comment 5 Yusuke Suzuki 2011-11-15 13:33:02 PST
Created attachment 115232 [details]
Patch

attached patch v2. First, checking this pattern should be escaped, and if it is false return early
Comment 6 Darin Adler 2011-11-15 13:44:40 PST
Comment on attachment 115232 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=115232&action=review

Looks OK. Needs a couple small changes.

> Source/JavaScriptCore/runtime/RegExpObject.cpp:124
> +    bool previousIsBackSlash = false;

backslash is one word, not two, so this should be named previousCharacterWasBackslash

> Source/JavaScriptCore/runtime/RegExpObject.cpp:145
> +        if (ch == '\n' || ch == '\r' || (ch & ~1) == 0x2028) {

This should use the inline isLineTerminator function instead of repeating its code.

> Source/JavaScriptCore/runtime/RegExpObject.cpp:157
> +    if (!shouldEscape)
> +      return jsString(exec, pattern);

This is indented wrong. Should be indented four spaces.

> Source/JavaScriptCore/runtime/RegExpObject.cpp:177
> +        if (ch == '\n' || ch == '\r' || (ch & ~1) == 0x2028) {

This should use the inline isLineTerminator function instead of repeating its code.
Comment 7 Yusuke Suzuki 2011-11-15 14:21:29 PST
(In reply to comment #6)
Thank you for your review.

> backslash is one word, not two, so this should be named previousCharacterWasBackslash
Yes, I'll rename it.

> > Source/JavaScriptCore/runtime/RegExpObject.cpp:145
> > +        if (ch == '\n' || ch == '\r' || (ch & ~1) == 0x2028) {
> This should use the inline isLineTerminator function instead of repeating its code.
Right, I'll change it to using Lexer<UChar>::isLineTerminator.

> This is indented wrong. Should be indented four spaces.
Oops. This is mistake.

I'll upload revised patch soon.
Comment 8 Yusuke Suzuki 2011-11-15 14:24:10 PST
Created attachment 115244 [details]
Patch

attached patch v3
Comment 9 Gavin Barraclough 2011-11-16 10:08:56 PST
Comment on attachment 115244 [details]
Patch

Looks great.
Comment 10 WebKit Review Bot 2011-11-16 10:44:35 PST
Comment on attachment 115244 [details]
Patch

Clearing flags on attachment: 115244

Committed r100469: <http://trac.webkit.org/changeset/100469>
Comment 11 WebKit Review Bot 2011-11-16 10:44:40 PST
All reviewed patches have been landed.  Closing bug.