WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
Bug 172069
Implement RegExp Unicode property escapes
https://bugs.webkit.org/show_bug.cgi?id=172069
Summary
Implement RegExp Unicode property escapes
Mathias Bynens
Reported
2017-05-13 02:42:15 PDT
This proposal is currently at Stage 3 and seeking implementations:
https://github.com/tc39/proposal-regexp-unicode-property-escapes
Tests:
https://github.com/tc39/test262/tree/master/test/built-ins/RegExp/property-escapes
Attachments
UCD Datefiles for Patch
(3.84 MB, patch)
2017-10-06 15:11 PDT
,
Michael Saboff
no flags
Details
Formatted Diff
Diff
Main Patch
(399.79 KB, patch)
2017-10-06 15:12 PDT
,
Michael Saboff
jfbastien
: review+
Details
Formatted Diff
Diff
Combined patch for EWS
(4.23 MB, patch)
2017-10-06 15:16 PDT
,
Michael Saboff
no flags
Details
Formatted Diff
Diff
Patch for EWS with suggested changes and speculative build fixes.
(4.23 MB, patch)
2017-10-06 17:21 PDT
,
Michael Saboff
no flags
Details
Formatted Diff
Diff
Patch for EWS with updated build fixes
(4.23 MB, patch)
2017-10-09 11:48 PDT
,
Michael Saboff
buildbot
: commit-queue-
Details
Formatted Diff
Diff
Archive of layout-test-results from ews100 for mac-elcapitan
(991.82 KB, application/zip)
2017-10-09 13:09 PDT
,
Build Bot
no flags
Details
Archive of layout-test-results from ews105 for mac-elcapitan-wk2
(1.12 MB, application/zip)
2017-10-09 13:13 PDT
,
Build Bot
no flags
Details
Patch for EWS with test fix
(4.23 MB, patch)
2017-10-09 13:19 PDT
,
Michael Saboff
no flags
Details
Formatted Diff
Diff
Show Obsolete
(5)
View All
Add attachment
proposed patch, testcase, etc.
Radar WebKit Bug Importer
Comment 1
2017-05-13 20:23:48 PDT
<
rdar://problem/32182540
>
Michael Saboff
Comment 2
2017-09-27 06:43:09 PDT
Active Radar for the bug is <
rdar://problem/33183184
>
Michael Saboff
Comment 3
2017-10-06 15:11:54 PDT
Created
attachment 323050
[details]
UCD Datefiles for Patch These are Unicode Database version 10 files that are input for the main patch.
Michael Saboff
Comment 4
2017-10-06 15:12:15 PDT
Created
attachment 323051
[details]
Main Patch
Build Bot
Comment 5
2017-10-06 15:14:39 PDT
Attachment 323051
[details]
did not pass style-queue: ERROR: Source/JavaScriptCore/yarr/YarrUnicodeProperties.cpp:72: Alphabetical sorting problem. [build/include_order] [4] ERROR: Source/JavaScriptCore/yarr/YarrPattern.cpp:1026: One space before end of line comments [whitespace/comments] [5] Total errors found: 2 in 21 files If any of these errors are false positives, please file a bug against check-webkit-style.
Michael Saboff
Comment 6
2017-10-06 15:16:55 PDT
Created
attachment 323053
[details]
Combined patch for EWS Review the "Main Patch".
Build Bot
Comment 7
2017-10-06 15:19:18 PDT
Attachment 323053
[details]
did not pass style-queue: ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:738: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:996: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:1135: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:1692: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/ScriptExtensions.txt:38: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/yarr/YarrPattern.cpp:1026: One space before end of line comments [whitespace/comments] [5] ERROR: Source/JavaScriptCore/yarr/YarrUnicodeProperties.cpp:72: Alphabetical sorting problem. [build/include_order] [4] ERROR: Source/JavaScriptCore/ucd/Scripts.txt:18: Line contains tab character. [whitespace/tab] [5] Total errors found: 8 in 31 files If any of these errors are false positives, please file a bug against check-webkit-style.
Build Bot
Comment 8
2017-10-06 15:29:07 PDT
Attachment 323050
[details]
did not pass style-queue: ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:738: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:996: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:1135: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:1692: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/Scripts.txt:18: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/ScriptExtensions.txt:38: Line contains tab character. [whitespace/tab] [5] Total errors found: 6 in 10 files If any of these errors are false positives, please file a bug against check-webkit-style.
JF Bastien
Comment 9
2017-10-06 15:53:39 PDT
Comment on
attachment 323051
[details]
Main Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=323051&action=review
r=me
> Source/JavaScriptCore/ChangeLog:22 > + names to the function index. Using these hashing tables, we can lookup up a property
up up up up
> Source/JavaScriptCore/Scripts/generateYarrUnicodePropertyTables.py:1 > +#! /usr/bin/python
#!/usr/bin/env python
> Source/JavaScriptCore/Scripts/generateYarrUnicodePropertyTables.py:62 > +// DO NO EDIT! - This file was generated by generateYarrUnicodePropertyTables.py
""" + __file__ + """
> Source/JavaScriptCore/Scripts/generateYarrUnicodePropertyTables.py:117 > + exit(1)
This is all a lot of error handling for a python script! Such dedication, I'd just `with open(...) as ...:` and call it a day. :)
> Source/JavaScriptCore/Scripts/hasher.py:1 > +#! /usr/bin/python
Ditto on shebang.
> Source/JavaScriptCore/Scripts/hasher.py:27 > +# Boston, MA 02110-1301, USA.
Isn't this all newly translated code?
> Source/JavaScriptCore/yarr/YarrInterpreter.cpp:307 > + size_t high = matches.size() - 1;
Is it ever empty?
> Source/JavaScriptCore/yarr/YarrInterpreter.cpp:336 > + size_t high = ranges.size() - 1;
Ditto.
> Source/JavaScriptCore/yarr/YarrInterpreter.cpp:354 > + const size_t thresholdForBinarySearch = 6;
I guess it's never empty because of this! Did you choose 6 out of some scientific measurement? Or pulled the number from a Magical Place?
> Source/JavaScriptCore/yarr/YarrParser.h:458 > + // m_err should be set tryConsumeUnicodePropertyExpression()
I don't get this comment. "set by" ?
> Source/JavaScriptCore/yarr/YarrPattern.cpp:98 > + addSortedMatchOrRange(min, max + 1);
Can this ever overflow max? Ditto below for the arithmetic.
> Source/JavaScriptCore/yarr/YarrPattern.cpp:418 > void atomBuiltInCharacterClass(BuiltInCharacterClassID classID, bool invert)
I find calls to this hard to read because of the boolean param. Could you make it an enum class CharacterClassInvert { Do, Dont }; or something similar?
> Source/JavaScriptCore/yarr/YarrPattern.cpp:1389 > +
Extra
> Source/JavaScriptCore/yarr/YarrPattern.h:50 > + }
You could just set these values above, and then rely on the compiler auto-generating the default ctor properly.
> Source/JavaScriptCore/yarr/YarrUnicodeProperties.cpp:52 > + ALWAYS_INLINE int entry(WTF::String key) const
const WTF::String & ?
> Source/JavaScriptCore/yarr/YarrUnicodeProperties.cpp:75 > +std::optional<BuiltInCharacterClassID> unicodeMatchPropertyValue(WTF::String unicodePropertyName, WTF::String unicodePropertyValue)
const WTF::String &
> Source/JavaScriptCore/yarr/YarrUnicodeProperties.cpp:92 > +std::optional<BuiltInCharacterClassID> unicodeMatchProperty(WTF::String unicodePropertyValue)
const WTF::String &
Michael Saboff
Comment 10
2017-10-06 16:51:21 PDT
Comment on
attachment 323051
[details]
Main Patch View in context:
https://bugs.webkit.org/attachment.cgi?id=323051&action=review
>> Source/JavaScriptCore/ChangeLog:22 >> + names to the function index. Using these hashing tables, we can lookup up a property > > up up up up
down with "up" - removed.
>> Source/JavaScriptCore/Scripts/generateYarrUnicodePropertyTables.py:1 >> +#! /usr/bin/python > > #!/usr/bin/env python
Fixed
>> Source/JavaScriptCore/Scripts/generateYarrUnicodePropertyTables.py:62 >> +// DO NO EDIT! - This file was generated by generateYarrUnicodePropertyTables.py > > """ + __file__ + """
Fixed
>> Source/JavaScriptCore/Scripts/generateYarrUnicodePropertyTables.py:117 >> + exit(1) > > This is all a lot of error handling for a python script! Such dedication, I'd just `with open(...) as ...:` and call it a day. :)
I do this since there are several UCD files that must be present for this script to run.
>> Source/JavaScriptCore/Scripts/hasher.py:1 >> +#! /usr/bin/python > > Ditto on shebang.
Fixed
>> Source/JavaScriptCore/Scripts/hasher.py:27 >> +# Boston, MA 02110-1301, USA. > > Isn't this all newly translated code?
I translated it from the Perl and C++ code we already have. I believe that the license is required.
>> Source/JavaScriptCore/yarr/YarrInterpreter.cpp:307 >> + size_t high = matches.size() - 1; > > Is it ever empty?
No, see below.
>> Source/JavaScriptCore/yarr/YarrInterpreter.cpp:354 >> + const size_t thresholdForBinarySearch = 6; > > I guess it's never empty because of this! > > Did you choose 6 out of some scientific measurement? Or pulled the number from a Magical Place?
I tried various numbers, but didn't do rigorous testing before settling on 6.
>> Source/JavaScriptCore/yarr/YarrParser.h:458 >> + // m_err should be set tryConsumeUnicodePropertyExpression() > > I don't get this comment. "set by" ?
Yes, "set by". Since it only sets on error, I changed this to: "tryConsumeUnicodePropertyExpression() will set m_err for a malformed property expression "
>> Source/JavaScriptCore/yarr/YarrPattern.cpp:98 >> + addSortedMatchOrRange(min, max + 1); > > Can this ever overflow max? Ditto below for the arithmetic.
Not with in the foreseeable future range of Unicode characters. The current max is 0x10ffff, well below 2^32-1.
>> Source/JavaScriptCore/yarr/YarrPattern.cpp:418 >> void atomBuiltInCharacterClass(BuiltInCharacterClassID classID, bool invert) > > I find calls to this hard to read because of the boolean param. Could you make it an enum class CharacterClassInvert { Do, Dont }; or something similar?
This is a pre-existing pattern. It may be this way since this is a delegate function that is part of a template parameter class.
>> Source/JavaScriptCore/yarr/YarrPattern.cpp:1389 >> + > > Extra
Removed.
>> Source/JavaScriptCore/yarr/YarrPattern.h:50 >> + } > > You could just set these values above, and then rely on the compiler auto-generating the default ctor properly.
Done
>> Source/JavaScriptCore/yarr/YarrUnicodeProperties.cpp:52 >> + ALWAYS_INLINE int entry(WTF::String key) const > > const WTF::String & ?
Done
>> Source/JavaScriptCore/yarr/YarrUnicodeProperties.cpp:75 >> +std::optional<BuiltInCharacterClassID> unicodeMatchPropertyValue(WTF::String unicodePropertyName, WTF::String unicodePropertyValue) > > const WTF::String &
That breaks the compile as this is called with the result from another function.
Michael Saboff
Comment 11
2017-10-06 17:21:03 PDT
Created
attachment 323072
[details]
Patch for EWS with suggested changes and speculative build fixes.
Build Bot
Comment 12
2017-10-06 17:24:26 PDT
Attachment 323072
[details]
did not pass style-queue: ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:738: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:996: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:1135: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:1692: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/ScriptExtensions.txt:38: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/yarr/YarrPattern.cpp:1026: One space before end of line comments [whitespace/comments] [5] ERROR: Source/JavaScriptCore/yarr/YarrUnicodeProperties.cpp:72: Alphabetical sorting problem. [build/include_order] [4] ERROR: Source/JavaScriptCore/ucd/Scripts.txt:18: Line contains tab character. [whitespace/tab] [5] Total errors found: 8 in 32 files If any of these errors are false positives, please file a bug against check-webkit-style.
Michael Saboff
Comment 13
2017-10-09 11:48:44 PDT
Created
attachment 323194
[details]
Patch for EWS with updated build fixes
Build Bot
Comment 14
2017-10-09 11:52:16 PDT
Attachment 323194
[details]
did not pass style-queue: ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:738: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:996: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:1135: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:1692: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/ScriptExtensions.txt:38: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/yarr/YarrPattern.cpp:1026: One space before end of line comments [whitespace/comments] [5] ERROR: Source/JavaScriptCore/yarr/YarrUnicodeProperties.cpp:72: Alphabetical sorting problem. [build/include_order] [4] ERROR: Source/JavaScriptCore/ucd/Scripts.txt:18: Line contains tab character. [whitespace/tab] [5] Total errors found: 8 in 32 files If any of these errors are false positives, please file a bug against check-webkit-style.
Build Bot
Comment 15
2017-10-09 13:09:27 PDT
Comment on
attachment 323194
[details]
Patch for EWS with updated build fixes
Attachment 323194
[details]
did not pass mac-ews (mac): Output:
http://webkit-queues.webkit.org/results/4802520
New failing tests: js/regexp-unicode-properties.html
Build Bot
Comment 16
2017-10-09 13:09:29 PDT
Created
attachment 323202
[details]
Archive of layout-test-results from ews100 for mac-elcapitan The attached test failures were seen while running run-webkit-tests on the mac-ews. Bot: ews100 Port: mac-elcapitan Platform: Mac OS X 10.11.6
Build Bot
Comment 17
2017-10-09 13:13:06 PDT
Comment on
attachment 323194
[details]
Patch for EWS with updated build fixes
Attachment 323194
[details]
did not pass mac-wk2-ews (mac-wk2): Output:
http://webkit-queues.webkit.org/results/4802548
New failing tests: js/regexp-unicode-properties.html
Build Bot
Comment 18
2017-10-09 13:13:08 PDT
Created
attachment 323206
[details]
Archive of layout-test-results from ews105 for mac-elcapitan-wk2 The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews. Bot: ews105 Port: mac-elcapitan-wk2 Platform: Mac OS X 10.11.6
Michael Saboff
Comment 19
2017-10-09 13:19:47 PDT
Created
attachment 323207
[details]
Patch for EWS with test fix Converted UTF-8 strings in test file to Unicode escapes.
Build Bot
Comment 20
2017-10-09 13:22:18 PDT
Attachment 323207
[details]
did not pass style-queue: ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:738: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:996: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:1135: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:1692: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/ScriptExtensions.txt:38: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/yarr/YarrPattern.cpp:1026: One space before end of line comments [whitespace/comments] [5] ERROR: Source/JavaScriptCore/yarr/YarrUnicodeProperties.cpp:72: Alphabetical sorting problem. [build/include_order] [4] ERROR: Source/JavaScriptCore/ucd/Scripts.txt:18: Line contains tab character. [whitespace/tab] [5] Total errors found: 8 in 32 files If any of these errors are false positives, please file a bug against check-webkit-style.
Michael Saboff
Comment 21
2017-10-09 16:14:48 PDT
Committed
r223081
: <
http://trac.webkit.org/changeset/223081
>
Ryan Haddad
Comment 22
2017-10-09 18:12:00 PDT
The following tests were marked as passing, but are failing on the Release test262 bot: test262.yaml/test262/test/built-ins/RegExp/named-groups/unicode-property-names.js.default-strict: ERROR: Unexpected exit code: 3 test262.yaml/test262/test/built-ins/RegExp/named-groups/unicode-property-names.js.default: ERROR: Unexpected exit code: 3 test262.yaml/test262/test/built-ins/RegExp/named-groups/unicode-references.js.default-strict: ERROR: Unexpected exit code: 3 test262.yaml/test262/test/built-ins/RegExp/named-groups/unicode-references.js.default: ERROR: Unexpected exit code: 3
https://build.webkit.org/builders/Apple%20El%20Capitan%20Release%20Test262%20%28Tests%29/builds/4731
Michael Saboff
Comment 23
2017-10-11 11:41:11 PDT
(In reply to Ryan Haddad from
comment #22
)
> The following tests were marked as passing, but are failing on the Release > test262 bot: > > test262.yaml/test262/test/built-ins/RegExp/named-groups/unicode-property- > names.js.default-strict: ERROR: Unexpected exit code: 3 > test262.yaml/test262/test/built-ins/RegExp/named-groups/unicode-property- > names.js.default: ERROR: Unexpected exit code: 3 > test262.yaml/test262/test/built-ins/RegExp/named-groups/unicode-references. > js.default-strict: ERROR: Unexpected exit code: 3 > test262.yaml/test262/test/built-ins/RegExp/named-groups/unicode-references. > js.default: ERROR: Unexpected exit code: 3 > >
https://build.webkit.org/builders/
> Apple%20El%20Capitan%20Release%20Test262%20%28Tests%29/builds/4731
Failing tests tracked in
https://bugs.webkit.org/show_bug.cgi?id=178177
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug