Bug 151597

Summary: Some tests fail with ES6 `u` (Unicode) flag for regular expressions
Product: WebKit Reporter: Mathias Bynens <mathias>
Component: JavaScriptCoreAssignee: Michael Saboff <msaboff>
Status: RESOLVED FIXED    
Severity: Normal CC: 50167214, commit-queue, ggaren, keith_miller, mark.lam, mathias, msaboff, oliver, saam, timothy, webkit-bug-importer, ysuzuki
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
See Also: https://bugs.webkit.org/show_bug.cgi?id=159055
Bug Depends on: 156044    
Bug Blocks: 151598    
Attachments:
Description Flags
Patch addressing \w and \W with "iu" flags none

Comment 1 Mathias Bynens 2015-11-25 02:29:29 PST
More background on the `u` flag for regular expressions: https://mathiasbynens.be/notes/es6-unicode-regex
Comment 2 Mathias Bynens 2016-03-30 12:44:07 PDT
Seems like this has been implemented in Safari Technology Preview v9.1.1 (11601.6.10, 11602.1.25).

However, the implementation is buggy: http://mathias.html5.org/tests/javascript/regexp/

The following tests fail:

    assert_equals(/𝌆{2}/u.test('𝌆𝌆'), true);
    assert_equals(/\uD834\uDF06{2}/u.test('\uD834\uDF06\uD834\uDF06'), true);

    assert_equals(/\W/iu.test('S'), true);
    assert_equals(/\W/iu.test('K'), true);


Please fix this before shipping this in a stable release to avoid compatibility problems.
Comment 3 Radar WebKit Bug Importer 2016-03-30 12:48:38 PDT
<rdar://problem/25447036>
Comment 4 Timothy Hatcher 2016-03-30 12:53:08 PDT
This was implemented with bug 154842.
Comment 5 Michael Saboff 2016-03-30 15:59:00 PDT
(In reply to comment #2)
> Seems like this has been implemented in Safari Technology Preview v9.1.1
> (11601.6.10, 11602.1.25).
> 
> However, the implementation is buggy:
> http://mathias.html5.org/tests/javascript/regexp/
> 
> The following tests fail:
> 
>     assert_equals(/𝌆{2}/u.test('𝌆𝌆'), true);
>     assert_equals(/\uD834\uDF06{2}/u.test('\uD834\uDF06\uD834\uDF06'), true);

These two tests point out bug in quantified unicode regular expression processing.

>     assert_equals(/\W/iu.test('S'), true);
>     assert_equals(/\W/iu.test('K'), true);

According the CharacterClassEscape pattern semantic rules specified in the ES6 spec section 21.2.2.12 (https://tc39.github.io/ecma262/2016/#sec-characterclassescape) along with the canonicalization rules found at 21.2.2.8.2 (https://tc39.github.io/ecma262/2016/#sec-runtime-semantics-canonicalize-ch), upper case ASCII 'S' and 'K' ARE word characters and therefore should fail with the non-word, \W, character class.  This also holds true for when the ignore case flag is provided.

Note that the Chrome team believes that the current Chrome canary (51.0.2692.0 canary) incorrectly handles these two test cases.  This Chrome issue is tracked in https://bugs.chromium.org/p/v8/issues/detail?id=4879.

> Please fix this before shipping this in a stable release to avoid
> compatibility problems.

I created a new bug (https://bugs.webkit.org/show_bug.cgi?id=156044) to track just the quantified unicode RegExp test failures.
Comment 6 Mathias Bynens 2016-03-30 23:27:40 PDT
(In reply to comment #5)
> (In reply to comment #2)
> > Seems like this has been implemented in Safari Technology Preview v9.1.1
> > (11601.6.10, 11602.1.25).
> > 
> > However, the implementation is buggy:
> > http://mathias.html5.org/tests/javascript/regexp/
> > 
> > The following tests fail:
> > 
> >     assert_equals(/𝌆{2}/u.test('𝌆𝌆'), true);
> >     assert_equals(/\uD834\uDF06{2}/u.test('\uD834\uDF06\uD834\uDF06'), true);
> 
> These two tests point out bug in quantified unicode regular expression
> processing.
> 
> >     assert_equals(/\W/iu.test('S'), true);
> >     assert_equals(/\W/iu.test('K'), true);
> 
> According the CharacterClassEscape pattern semantic rules specified in the
> ES6 spec section 21.2.2.12
> (https://tc39.github.io/ecma262/2016/#sec-characterclassescape) along with
> the canonicalization rules found at 21.2.2.8.2
> (https://tc39.github.io/ecma262/2016/#sec-runtime-semantics-canonicalize-ch),
> upper case ASCII 'S' and 'K' ARE word characters and therefore should fail
> with the non-word, \W, character class.

Without the `u` and `i` flags enabled, this statements is entirely correct.

> This also holds true for when the ignore case flag is provided.

This is incorrect, though. Did you read the explanation at https://mathiasbynens.be/notes/es6-unicode-regex#impact-i?

> Note that the Chrome team believes that the current Chrome canary
> (51.0.2692.0 canary) incorrectly handles these two test cases.  This Chrome
> issue is tracked in https://bugs.chromium.org/p/v8/issues/detail?id=4879.

No, they got it right: https://bugs.chromium.org/p/v8/issues/detail?id=4879#c3

> I created a new bug (https://bugs.webkit.org/show_bug.cgi?id=156044) to
> track just the quantified unicode RegExp test failures.

Thanks.
Comment 7 Michael Saboff 2016-04-13 14:59:10 PDT
As the standard is currently written, /\W/iu should match 's', 'k', 'S' and 'K'.  I disagree with the standard and have created a pull request to change the standard.  That request can be found at https://github.com/tc39/ecma262/pull/525.  In the mean time, I will fix the implementation.
Comment 8 Michael Saboff 2016-04-13 16:54:23 PDT
Created attachment 276366 [details]
Patch addressing \w and \W with "iu" flags
Comment 9 Geoffrey Garen 2016-04-13 16:56:08 PDT
Comment on attachment 276366 [details]
Patch addressing \w and \W with "iu" flags

r=me
Comment 10 WebKit Commit Bot 2016-04-13 17:47:32 PDT
Comment on attachment 276366 [details]
Patch addressing \w and \W with "iu" flags

Clearing flags on attachment: 276366

Committed r199523: <http://trac.webkit.org/changeset/199523>
Comment 11 WebKit Commit Bot 2016-04-13 17:47:35 PDT
All reviewed patches have been landed.  Closing bug.