Bug 50554 - RegExp.prototype.toString does not escape slashes
Summary: RegExp.prototype.toString does not escape slashes
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Gavin Barraclough
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-12-06 00:49 PST by Jan de Mooij
Modified: 2011-06-27 22:56 PDT (History)
6 users (show)

See Also:


Attachments
The patch (8.70 KB, patch)
2011-06-18 01:38 PDT, Gavin Barraclough
darin: review-
webkit.review.bot: commit-queue-
Details | Formatted Diff | Diff
Archive of layout-test-results from ec2-cr-linux-03 (1.54 MB, application/zip)
2011-06-18 02:10 PDT, WebKit Review Bot
no flags Details
New patch (8.87 KB, patch)
2011-06-18 19:19 PDT, Gavin Barraclough
oliver: review+
webkit.review.bot: commit-queue-
Details | Formatted Diff | Diff
Archive of layout-test-results from ec2-cr-linux-01 (1.43 MB, application/zip)
2011-06-18 19:37 PDT, WebKit Review Bot
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jan de Mooij 2010-12-06 00:49:46 PST
Consider this: RegExp("abc/def").toString()

Result in FF and Opera: /abc\/def/
Result in Chrome and Safari: /abc/def/

It does work correctly for literal regexps like /abc\/def/.toString()

ES5 15.10.6.4 mentions this:
===
NOTE The returned String has the form of a RegularExpressionLiteral that evaluates to another RegExp object with the same behaviour as this object.
===
/abc/def/ is invalid as regexp literal, so the behavior is different in Safari/Chrome.

FWIW, here is the V8 bug: http://code.google.com/p/v8/issues/detail?id=956
Comment 1 Mark Hahnenberg 2011-01-13 18:13:37 PST
There also appears to be a related discrepancy between the behaviors of the source property of RegExp objects in Firefox and Chrome/Safari.  

RegExp("abc/def").source

Result in FF: abc\/def
Result in Chrome/Safari: abc/def

I have a patch that fixes the problem with toString() without modifying the behavior of the source property, but it would be much simpler if the behavior of the source property needed to change as well.  Is this what needs to happen?

From ES5 15.10.7.1:
"The value of the source property is a String in the form of a Pattern representing the current regular expression."

The grammar for Patterns (15.10.1) seems to indicate that the way Firefox does it is incorrect.
Comment 2 Gavin Barraclough 2011-01-14 11:40:29 PST
(In reply to comment #1)

Hi Mark,

> From ES5 15.10.7.1:
> "The value of the source property is a String in the form of a Pattern representing the current regular expression."

The term 'representing' seems ambiguous to me, but I guess we can safely assume a pattern represents a regular expression if it would compile to have the same [[Match]] behavior.

I'd suggest the use of the indefinite article is probably key in interpreting this sentence.  The spec refers to 'a' pattern rather than requiring a particular pattern - if the spec is intended to require 'the' pattern from which it was compiled I think it would say so.  As such, given the RegExp /ABC/ I'd suggest that an implementation could return "ABC", "\x61\x62\x63", or any other pattern that would compile to produce the same expression, and still be considered to be in compliance with the spec.

> The grammar for Patterns (15.10.1) seems to indicate that the way Firefox does it is incorrect.

Sorry - I don't follow, what's your reasoning here? - based on the interpretation above, FF's pattern does represent the source, and "abc\/def" is a valid pattern per section 15.10.1 of the spec ("\/" is a valid IdentityEscape).  Are you assuming a more strict definition of 'source'? - that it must match the pattern compiled from?

> I have a patch that fixes the problem with toString() without modifying the behavior of the source property, but it would be much simpler if the behavior of the source property needed to change as well.  Is this what needs to happen?

Great!

If FireFox's behavior is incorrect, then we should stick to producing "abc/def" for source, & just change toString.  But assuming my interpretation of the spec is correct, I'd say changing 'source' too to match FireFox make sense.

pros to switching:
* it's good from a web compatibility perspective to match FireFox.
* from a developer's perspective, the escaped form is more useful (can be concatenated into a string & eval'ed as a valid RegularExpressionLiteral).
* from a developer's perspective, I'd argue having two string representations for a RegExp would be potentially confusing & could be problematic, likely better to have a single canonical string representation associated with a given RegExp.

cons to switching:
* intuitively it seems like 'source' should actually be the source the expression was compiled from.

On balance, I'd certainly favour following FF on this one.

cheers,
G.

[ I think there is probably a cleaner and simpler solution, that is also likely completely impractical.  The fundamental problem here is that there are valid patterns that are not a valid RegularExpressionLiteral.  If we could just amend the spec to make an unescaped forwards slash an invalid pattern character, then this issue should go away. :-P ]
Comment 3 Mark Hahnenberg 2011-01-14 12:29:08 PST
(In reply to comment #2)

> Sorry - I don't follow, what's your reasoning here? - based on the interpretation above, FF's pattern does represent the source, and "abc\/def" is a valid pattern per section 15.10.1 of the spec ("\/" is a valid IdentityEscape).  Are you assuming a more strict definition of 'source'? - that it must match the pattern compiled from?

I misread the grammar for Patterns.  You are correct.

> pros to switching:
> * it's good from a web compatibility perspective to match FireFox.
> * from a developer's perspective, the escaped form is more useful (can be concatenated into a string & eval'ed as a valid RegularExpressionLiteral).
> * from a developer's perspective, I'd argue having two string representations for a RegExp would be potentially confusing & could be problematic, likely better to have a single canonical string representation associated with a given RegExp.
> 
> cons to switching:
> * intuitively it seems like 'source' should actually be the source the expression was compiled from.
> 
> On balance, I'd certainly favour following FF on this one.

I agree with this :-)  I'll make the necessary changes to make source and toString() match Firefox's behavior (interestingly, RegExp objects created using the literal notation already match Firefox's behavior).
Comment 4 Gavin Barraclough 2011-06-18 01:38:21 PDT
Created attachment 97691 [details]
The patch
Comment 5 WebKit Review Bot 2011-06-18 01:40:08 PDT
Attachment 97691 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'LayoutTests/ChangeLog', u'LayoutTests/fast..." exit_code: 1

Source/JavaScriptCore/runtime/RegExpObject.cpp:130:  Missing space before ( in while(  [whitespace/parens] [5]
Source/JavaScriptCore/runtime/RegExpObject.cpp:138:  Missing space before {  [whitespace/braces] [5]
Total errors found: 2 in 7 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 6 WebKit Review Bot 2011-06-18 02:10:28 PDT
Comment on attachment 97691 [details]
The patch

Attachment 97691 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/8874811

New failing tests:
fast/regex/toString.html
Comment 7 WebKit Review Bot 2011-06-18 02:10:33 PDT
Created attachment 97694 [details]
Archive of layout-test-results from ec2-cr-linux-03

The attached test failures were seen while running run-webkit-tests on the chromium-ews.
Bot: ec2-cr-linux-03  Port: Chromium  Platform: Linux-2.6.35-28-virtual-x86_64-with-Ubuntu-10.10-maverick
Comment 8 Darin Adler 2011-06-18 10:01:36 PDT
Comment on attachment 97691 [details]
The patch

View in context: https://bugs.webkit.org/attachment.cgi?id=97691&action=review

Looks like this caused some test failures"

  jquery/manipulation.html = TIMEOUT

Regressions: Unexpected text diff mismatch : (1)
  fast/regex/toString.html = TEXT

Could you check into those?

> Source/JavaScriptCore/runtime/RegExpObject.cpp:117
> +    size_t fwdSlash = pattern.find('/');

This local variable does not contain a forward slash. I think a better name would be forwardSlashLocation or forwardSlashPosition.

> Source/JavaScriptCore/runtime/RegExpObject.cpp:129
> +        size_t slashes = fwdSlash;

I would call this slashesPosition instead of slashes. Again, it's a number, not slashes, which is why.

>> Source/JavaScriptCore/runtime/RegExpObject.cpp:130
>> +        while(slashes && pattern[slashes - 1] == '\\')
> 
> Missing space before ( in while(  [whitespace/parens] [5]

The style bot is right about this missing space.

> Source/JavaScriptCore/runtime/RegExpObject.cpp:135
> +        // doubel escape it.

Typo: doubel

> Source/JavaScriptCore/runtime/RegExpObject.cpp:137
> +            result.append(pattern.substringSharingImpl(completed, fwdSlash + 1));

This tells me that substringSharingImpl needs a better name; we need a name that makes it clearer when you need to use it and when you must not use it.

> LayoutTests/fast/regex/script-tests/toString.js:11
> +    shouldBeTrue("re1.test(string)");
> +    shouldBeTrue("re2.test(string)");

Can you write this so that the patterns are visible in the shouldBe expressions, making the test output clearer? I think you should.

> LayoutTests/fast/regex/script-tests/toString.js:22
> +// These strings are equilvalent, since the '\' is identity escaping the '/' at the string level.

Typo: equivalent
Comment 9 Gavin Barraclough 2011-06-18 12:20:44 PDT
> Looks like this caused some test failures"
> 
>   jquery/manipulation.html = TIMEOUT

This passes for me locally, and I think it's unlikely I introduced any regression here; my patch only effect JSC, and this appears to be the cr-linux bot failing?

> Regressions: Unexpected text diff mismatch : (1)
>   fast/regex/toString.html = TEXT

Again this looks like the cr-linux bot failing? - so I'm assuming this is v8 failing the new test cases (per comments below, their behavior matches our old one & so they will have this bug to fix too).
Comment 10 Gavin Barraclough 2011-06-18 19:19:23 PDT
Created attachment 97712 [details]
New patch
Comment 11 WebKit Review Bot 2011-06-18 19:37:40 PDT
Comment on attachment 97712 [details]
New patch

Attachment 97712 [details] did not pass chromium-ews (chromium-xvfb):
Output: http://queues.webkit.org/results/8913006

New failing tests:
fast/regex/toString.html
Comment 12 WebKit Review Bot 2011-06-18 19:37:45 PDT
Created attachment 97714 [details]
Archive of layout-test-results from ec2-cr-linux-01

The attached test failures were seen while running run-webkit-tests on the chromium-ews.
Bot: ec2-cr-linux-01  Port: Chromium  Platform: Linux-2.6.35-28-virtual-x86_64-with-Ubuntu-10.10-maverick
Comment 13 Gavin Barraclough 2011-06-27 22:56:05 PDT
Fixed in r89895