Spec: https://tc39.github.io/ecma262/#sec-pattern-semantics More background: https://mathiasbynens.be/notes/javascript-unicode
More background on the `u` flag for regular expressions: https://mathiasbynens.be/notes/es6-unicode-regex
Seems like this has been implemented in Safari Technology Preview v9.1.1 (11601.6.10, 11602.1.25). However, the implementation is buggy: http://mathias.html5.org/tests/javascript/regexp/ The following tests fail: assert_equals(/𝌆{2}/u.test('𝌆𝌆'), true); assert_equals(/\uD834\uDF06{2}/u.test('\uD834\uDF06\uD834\uDF06'), true); assert_equals(/\W/iu.test('S'), true); assert_equals(/\W/iu.test('K'), true); Please fix this before shipping this in a stable release to avoid compatibility problems.
<rdar://problem/25447036>
This was implemented with bug 154842.
(In reply to comment #2) > Seems like this has been implemented in Safari Technology Preview v9.1.1 > (11601.6.10, 11602.1.25). > > However, the implementation is buggy: > http://mathias.html5.org/tests/javascript/regexp/ > > The following tests fail: > > assert_equals(/𝌆{2}/u.test('𝌆𝌆'), true); > assert_equals(/\uD834\uDF06{2}/u.test('\uD834\uDF06\uD834\uDF06'), true); These two tests point out bug in quantified unicode regular expression processing. > assert_equals(/\W/iu.test('S'), true); > assert_equals(/\W/iu.test('K'), true); According the CharacterClassEscape pattern semantic rules specified in the ES6 spec section 21.2.2.12 (https://tc39.github.io/ecma262/2016/#sec-characterclassescape) along with the canonicalization rules found at 21.2.2.8.2 (https://tc39.github.io/ecma262/2016/#sec-runtime-semantics-canonicalize-ch), upper case ASCII 'S' and 'K' ARE word characters and therefore should fail with the non-word, \W, character class. This also holds true for when the ignore case flag is provided. Note that the Chrome team believes that the current Chrome canary (51.0.2692.0 canary) incorrectly handles these two test cases. This Chrome issue is tracked in https://bugs.chromium.org/p/v8/issues/detail?id=4879. > Please fix this before shipping this in a stable release to avoid > compatibility problems. I created a new bug (https://bugs.webkit.org/show_bug.cgi?id=156044) to track just the quantified unicode RegExp test failures.
(In reply to comment #5) > (In reply to comment #2) > > Seems like this has been implemented in Safari Technology Preview v9.1.1 > > (11601.6.10, 11602.1.25). > > > > However, the implementation is buggy: > > http://mathias.html5.org/tests/javascript/regexp/ > > > > The following tests fail: > > > > assert_equals(/𝌆{2}/u.test('𝌆𝌆'), true); > > assert_equals(/\uD834\uDF06{2}/u.test('\uD834\uDF06\uD834\uDF06'), true); > > These two tests point out bug in quantified unicode regular expression > processing. > > > assert_equals(/\W/iu.test('S'), true); > > assert_equals(/\W/iu.test('K'), true); > > According the CharacterClassEscape pattern semantic rules specified in the > ES6 spec section 21.2.2.12 > (https://tc39.github.io/ecma262/2016/#sec-characterclassescape) along with > the canonicalization rules found at 21.2.2.8.2 > (https://tc39.github.io/ecma262/2016/#sec-runtime-semantics-canonicalize-ch), > upper case ASCII 'S' and 'K' ARE word characters and therefore should fail > with the non-word, \W, character class. Without the `u` and `i` flags enabled, this statements is entirely correct. > This also holds true for when the ignore case flag is provided. This is incorrect, though. Did you read the explanation at https://mathiasbynens.be/notes/es6-unicode-regex#impact-i? > Note that the Chrome team believes that the current Chrome canary > (51.0.2692.0 canary) incorrectly handles these two test cases. This Chrome > issue is tracked in https://bugs.chromium.org/p/v8/issues/detail?id=4879. No, they got it right: https://bugs.chromium.org/p/v8/issues/detail?id=4879#c3 > I created a new bug (https://bugs.webkit.org/show_bug.cgi?id=156044) to > track just the quantified unicode RegExp test failures. Thanks.
As the standard is currently written, /\W/iu should match 's', 'k', 'S' and 'K'. I disagree with the standard and have created a pull request to change the standard. That request can be found at https://github.com/tc39/ecma262/pull/525. In the mean time, I will fix the implementation.
Created attachment 276366 [details] Patch addressing \w and \W with "iu" flags
Comment on attachment 276366 [details] Patch addressing \w and \W with "iu" flags r=me
Comment on attachment 276366 [details] Patch addressing \w and \W with "iu" flags Clearing flags on attachment: 276366 Committed r199523: <http://trac.webkit.org/changeset/199523>
All reviewed patches have been landed. Closing bug.