Bug 151597 - Some tests fail with ES6 `u` (Unicode) flag for regular expressions
Summary: Some tests fail with ES6 `u` (Unicode) flag for regular expressions
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Michael Saboff
URL:
Keywords: InRadar
Depends on: 156044
Blocks: 151598
  Show dependency treegraph
 
Reported: 2015-11-25 02:26 PST by Mathias Bynens
Modified: 2016-06-23 02:18 PDT (History)
12 users (show)

See Also:


Attachments
Patch addressing \w and \W with "iu" flags (13.02 KB, patch)
2016-04-13 16:54 PDT, Michael Saboff
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Comment 1 Mathias Bynens 2015-11-25 02:29:29 PST
More background on the `u` flag for regular expressions: https://mathiasbynens.be/notes/es6-unicode-regex
Comment 2 Mathias Bynens 2016-03-30 12:44:07 PDT
Seems like this has been implemented in Safari Technology Preview v9.1.1 (11601.6.10, 11602.1.25).

However, the implementation is buggy: http://mathias.html5.org/tests/javascript/regexp/

The following tests fail:

    assert_equals(/𝌆{2}/u.test('𝌆𝌆'), true);
    assert_equals(/\uD834\uDF06{2}/u.test('\uD834\uDF06\uD834\uDF06'), true);

    assert_equals(/\W/iu.test('S'), true);
    assert_equals(/\W/iu.test('K'), true);


Please fix this before shipping this in a stable release to avoid compatibility problems.
Comment 3 Radar WebKit Bug Importer 2016-03-30 12:48:38 PDT
<rdar://problem/25447036>
Comment 4 Timothy Hatcher 2016-03-30 12:53:08 PDT
This was implemented with bug 154842.
Comment 5 Michael Saboff 2016-03-30 15:59:00 PDT
(In reply to comment #2)
> Seems like this has been implemented in Safari Technology Preview v9.1.1
> (11601.6.10, 11602.1.25).
> 
> However, the implementation is buggy:
> http://mathias.html5.org/tests/javascript/regexp/
> 
> The following tests fail:
> 
>     assert_equals(/𝌆{2}/u.test('𝌆𝌆'), true);
>     assert_equals(/\uD834\uDF06{2}/u.test('\uD834\uDF06\uD834\uDF06'), true);

These two tests point out bug in quantified unicode regular expression processing.

>     assert_equals(/\W/iu.test('S'), true);
>     assert_equals(/\W/iu.test('K'), true);

According the CharacterClassEscape pattern semantic rules specified in the ES6 spec section 21.2.2.12 (https://tc39.github.io/ecma262/2016/#sec-characterclassescape) along with the canonicalization rules found at 21.2.2.8.2 (https://tc39.github.io/ecma262/2016/#sec-runtime-semantics-canonicalize-ch), upper case ASCII 'S' and 'K' ARE word characters and therefore should fail with the non-word, \W, character class.  This also holds true for when the ignore case flag is provided.

Note that the Chrome team believes that the current Chrome canary (51.0.2692.0 canary) incorrectly handles these two test cases.  This Chrome issue is tracked in https://bugs.chromium.org/p/v8/issues/detail?id=4879.

> Please fix this before shipping this in a stable release to avoid
> compatibility problems.

I created a new bug (https://bugs.webkit.org/show_bug.cgi?id=156044) to track just the quantified unicode RegExp test failures.
Comment 6 Mathias Bynens 2016-03-30 23:27:40 PDT
(In reply to comment #5)
> (In reply to comment #2)
> > Seems like this has been implemented in Safari Technology Preview v9.1.1
> > (11601.6.10, 11602.1.25).
> > 
> > However, the implementation is buggy:
> > http://mathias.html5.org/tests/javascript/regexp/
> > 
> > The following tests fail:
> > 
> >     assert_equals(/𝌆{2}/u.test('𝌆𝌆'), true);
> >     assert_equals(/\uD834\uDF06{2}/u.test('\uD834\uDF06\uD834\uDF06'), true);
> 
> These two tests point out bug in quantified unicode regular expression
> processing.
> 
> >     assert_equals(/\W/iu.test('S'), true);
> >     assert_equals(/\W/iu.test('K'), true);
> 
> According the CharacterClassEscape pattern semantic rules specified in the
> ES6 spec section 21.2.2.12
> (https://tc39.github.io/ecma262/2016/#sec-characterclassescape) along with
> the canonicalization rules found at 21.2.2.8.2
> (https://tc39.github.io/ecma262/2016/#sec-runtime-semantics-canonicalize-ch),
> upper case ASCII 'S' and 'K' ARE word characters and therefore should fail
> with the non-word, \W, character class.

Without the `u` and `i` flags enabled, this statements is entirely correct.

> This also holds true for when the ignore case flag is provided.

This is incorrect, though. Did you read the explanation at https://mathiasbynens.be/notes/es6-unicode-regex#impact-i?

> Note that the Chrome team believes that the current Chrome canary
> (51.0.2692.0 canary) incorrectly handles these two test cases.  This Chrome
> issue is tracked in https://bugs.chromium.org/p/v8/issues/detail?id=4879.

No, they got it right: https://bugs.chromium.org/p/v8/issues/detail?id=4879#c3

> I created a new bug (https://bugs.webkit.org/show_bug.cgi?id=156044) to
> track just the quantified unicode RegExp test failures.

Thanks.
Comment 7 Michael Saboff 2016-04-13 14:59:10 PDT
As the standard is currently written, /\W/iu should match 's', 'k', 'S' and 'K'.  I disagree with the standard and have created a pull request to change the standard.  That request can be found at https://github.com/tc39/ecma262/pull/525.  In the mean time, I will fix the implementation.
Comment 8 Michael Saboff 2016-04-13 16:54:23 PDT
Created attachment 276366 [details]
Patch addressing \w and \W with "iu" flags
Comment 9 Geoffrey Garen 2016-04-13 16:56:08 PDT
Comment on attachment 276366 [details]
Patch addressing \w and \W with "iu" flags

r=me
Comment 10 WebKit Commit Bot 2016-04-13 17:47:32 PDT
Comment on attachment 276366 [details]
Patch addressing \w and \W with "iu" flags

Clearing flags on attachment: 276366

Committed r199523: <http://trac.webkit.org/changeset/199523>
Comment 11 WebKit Commit Bot 2016-04-13 17:47:35 PDT
All reviewed patches have been landed.  Closing bug.