This proposal is currently at Stage 3 and seeking implementations: https://github.com/tc39/proposal-regexp-unicode-property-escapes Tests: https://github.com/tc39/test262/tree/master/test/built-ins/RegExp/property-escapes
<rdar://problem/32182540>
Active Radar for the bug is <rdar://problem/33183184>
Created attachment 323050 [details] UCD Datefiles for Patch These are Unicode Database version 10 files that are input for the main patch.
Created attachment 323051 [details] Main Patch
Attachment 323051 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/yarr/YarrUnicodeProperties.cpp:72: Alphabetical sorting problem. [build/include_order] [4] ERROR: Source/JavaScriptCore/yarr/YarrPattern.cpp:1026: One space before end of line comments [whitespace/comments] [5] Total errors found: 2 in 21 files If any of these errors are false positives, please file a bug against check-webkit-style.
Created attachment 323053 [details] Combined patch for EWS Review the "Main Patch".
Attachment 323053 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:738: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:996: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:1135: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:1692: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/ScriptExtensions.txt:38: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/yarr/YarrPattern.cpp:1026: One space before end of line comments [whitespace/comments] [5] ERROR: Source/JavaScriptCore/yarr/YarrUnicodeProperties.cpp:72: Alphabetical sorting problem. [build/include_order] [4] ERROR: Source/JavaScriptCore/ucd/Scripts.txt:18: Line contains tab character. [whitespace/tab] [5] Total errors found: 8 in 31 files If any of these errors are false positives, please file a bug against check-webkit-style.
Attachment 323050 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:738: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:996: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:1135: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:1692: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/Scripts.txt:18: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/ScriptExtensions.txt:38: Line contains tab character. [whitespace/tab] [5] Total errors found: 6 in 10 files If any of these errors are false positives, please file a bug against check-webkit-style.
Comment on attachment 323051 [details] Main Patch View in context: https://bugs.webkit.org/attachment.cgi?id=323051&action=review r=me > Source/JavaScriptCore/ChangeLog:22 > + names to the function index. Using these hashing tables, we can lookup up a property up up up up > Source/JavaScriptCore/Scripts/generateYarrUnicodePropertyTables.py:1 > +#! /usr/bin/python #!/usr/bin/env python > Source/JavaScriptCore/Scripts/generateYarrUnicodePropertyTables.py:62 > +// DO NO EDIT! - This file was generated by generateYarrUnicodePropertyTables.py """ + __file__ + """ > Source/JavaScriptCore/Scripts/generateYarrUnicodePropertyTables.py:117 > + exit(1) This is all a lot of error handling for a python script! Such dedication, I'd just `with open(...) as ...:` and call it a day. :) > Source/JavaScriptCore/Scripts/hasher.py:1 > +#! /usr/bin/python Ditto on shebang. > Source/JavaScriptCore/Scripts/hasher.py:27 > +# Boston, MA 02110-1301, USA. Isn't this all newly translated code? > Source/JavaScriptCore/yarr/YarrInterpreter.cpp:307 > + size_t high = matches.size() - 1; Is it ever empty? > Source/JavaScriptCore/yarr/YarrInterpreter.cpp:336 > + size_t high = ranges.size() - 1; Ditto. > Source/JavaScriptCore/yarr/YarrInterpreter.cpp:354 > + const size_t thresholdForBinarySearch = 6; I guess it's never empty because of this! Did you choose 6 out of some scientific measurement? Or pulled the number from a Magical Place? > Source/JavaScriptCore/yarr/YarrParser.h:458 > + // m_err should be set tryConsumeUnicodePropertyExpression() I don't get this comment. "set by" ? > Source/JavaScriptCore/yarr/YarrPattern.cpp:98 > + addSortedMatchOrRange(min, max + 1); Can this ever overflow max? Ditto below for the arithmetic. > Source/JavaScriptCore/yarr/YarrPattern.cpp:418 > void atomBuiltInCharacterClass(BuiltInCharacterClassID classID, bool invert) I find calls to this hard to read because of the boolean param. Could you make it an enum class CharacterClassInvert { Do, Dont }; or something similar? > Source/JavaScriptCore/yarr/YarrPattern.cpp:1389 > + Extra > Source/JavaScriptCore/yarr/YarrPattern.h:50 > + } You could just set these values above, and then rely on the compiler auto-generating the default ctor properly. > Source/JavaScriptCore/yarr/YarrUnicodeProperties.cpp:52 > + ALWAYS_INLINE int entry(WTF::String key) const const WTF::String & ? > Source/JavaScriptCore/yarr/YarrUnicodeProperties.cpp:75 > +std::optional<BuiltInCharacterClassID> unicodeMatchPropertyValue(WTF::String unicodePropertyName, WTF::String unicodePropertyValue) const WTF::String & > Source/JavaScriptCore/yarr/YarrUnicodeProperties.cpp:92 > +std::optional<BuiltInCharacterClassID> unicodeMatchProperty(WTF::String unicodePropertyValue) const WTF::String &
Comment on attachment 323051 [details] Main Patch View in context: https://bugs.webkit.org/attachment.cgi?id=323051&action=review >> Source/JavaScriptCore/ChangeLog:22 >> + names to the function index. Using these hashing tables, we can lookup up a property > > up up up up down with "up" - removed. >> Source/JavaScriptCore/Scripts/generateYarrUnicodePropertyTables.py:1 >> +#! /usr/bin/python > > #!/usr/bin/env python Fixed >> Source/JavaScriptCore/Scripts/generateYarrUnicodePropertyTables.py:62 >> +// DO NO EDIT! - This file was generated by generateYarrUnicodePropertyTables.py > > """ + __file__ + """ Fixed >> Source/JavaScriptCore/Scripts/generateYarrUnicodePropertyTables.py:117 >> + exit(1) > > This is all a lot of error handling for a python script! Such dedication, I'd just `with open(...) as ...:` and call it a day. :) I do this since there are several UCD files that must be present for this script to run. >> Source/JavaScriptCore/Scripts/hasher.py:1 >> +#! /usr/bin/python > > Ditto on shebang. Fixed >> Source/JavaScriptCore/Scripts/hasher.py:27 >> +# Boston, MA 02110-1301, USA. > > Isn't this all newly translated code? I translated it from the Perl and C++ code we already have. I believe that the license is required. >> Source/JavaScriptCore/yarr/YarrInterpreter.cpp:307 >> + size_t high = matches.size() - 1; > > Is it ever empty? No, see below. >> Source/JavaScriptCore/yarr/YarrInterpreter.cpp:354 >> + const size_t thresholdForBinarySearch = 6; > > I guess it's never empty because of this! > > Did you choose 6 out of some scientific measurement? Or pulled the number from a Magical Place? I tried various numbers, but didn't do rigorous testing before settling on 6. >> Source/JavaScriptCore/yarr/YarrParser.h:458 >> + // m_err should be set tryConsumeUnicodePropertyExpression() > > I don't get this comment. "set by" ? Yes, "set by". Since it only sets on error, I changed this to: "tryConsumeUnicodePropertyExpression() will set m_err for a malformed property expression " >> Source/JavaScriptCore/yarr/YarrPattern.cpp:98 >> + addSortedMatchOrRange(min, max + 1); > > Can this ever overflow max? Ditto below for the arithmetic. Not with in the foreseeable future range of Unicode characters. The current max is 0x10ffff, well below 2^32-1. >> Source/JavaScriptCore/yarr/YarrPattern.cpp:418 >> void atomBuiltInCharacterClass(BuiltInCharacterClassID classID, bool invert) > > I find calls to this hard to read because of the boolean param. Could you make it an enum class CharacterClassInvert { Do, Dont }; or something similar? This is a pre-existing pattern. It may be this way since this is a delegate function that is part of a template parameter class. >> Source/JavaScriptCore/yarr/YarrPattern.cpp:1389 >> + > > Extra Removed. >> Source/JavaScriptCore/yarr/YarrPattern.h:50 >> + } > > You could just set these values above, and then rely on the compiler auto-generating the default ctor properly. Done >> Source/JavaScriptCore/yarr/YarrUnicodeProperties.cpp:52 >> + ALWAYS_INLINE int entry(WTF::String key) const > > const WTF::String & ? Done >> Source/JavaScriptCore/yarr/YarrUnicodeProperties.cpp:75 >> +std::optional<BuiltInCharacterClassID> unicodeMatchPropertyValue(WTF::String unicodePropertyName, WTF::String unicodePropertyValue) > > const WTF::String & That breaks the compile as this is called with the result from another function.
Created attachment 323072 [details] Patch for EWS with suggested changes and speculative build fixes.
Attachment 323072 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:738: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:996: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:1135: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:1692: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/ScriptExtensions.txt:38: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/yarr/YarrPattern.cpp:1026: One space before end of line comments [whitespace/comments] [5] ERROR: Source/JavaScriptCore/yarr/YarrUnicodeProperties.cpp:72: Alphabetical sorting problem. [build/include_order] [4] ERROR: Source/JavaScriptCore/ucd/Scripts.txt:18: Line contains tab character. [whitespace/tab] [5] Total errors found: 8 in 32 files If any of these errors are false positives, please file a bug against check-webkit-style.
Created attachment 323194 [details] Patch for EWS with updated build fixes
Attachment 323194 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:738: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:996: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:1135: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:1692: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/ScriptExtensions.txt:38: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/yarr/YarrPattern.cpp:1026: One space before end of line comments [whitespace/comments] [5] ERROR: Source/JavaScriptCore/yarr/YarrUnicodeProperties.cpp:72: Alphabetical sorting problem. [build/include_order] [4] ERROR: Source/JavaScriptCore/ucd/Scripts.txt:18: Line contains tab character. [whitespace/tab] [5] Total errors found: 8 in 32 files If any of these errors are false positives, please file a bug against check-webkit-style.
Comment on attachment 323194 [details] Patch for EWS with updated build fixes Attachment 323194 [details] did not pass mac-ews (mac): Output: http://webkit-queues.webkit.org/results/4802520 New failing tests: js/regexp-unicode-properties.html
Created attachment 323202 [details] Archive of layout-test-results from ews100 for mac-elcapitan The attached test failures were seen while running run-webkit-tests on the mac-ews. Bot: ews100 Port: mac-elcapitan Platform: Mac OS X 10.11.6
Comment on attachment 323194 [details] Patch for EWS with updated build fixes Attachment 323194 [details] did not pass mac-wk2-ews (mac-wk2): Output: http://webkit-queues.webkit.org/results/4802548 New failing tests: js/regexp-unicode-properties.html
Created attachment 323206 [details] Archive of layout-test-results from ews105 for mac-elcapitan-wk2 The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews. Bot: ews105 Port: mac-elcapitan-wk2 Platform: Mac OS X 10.11.6
Created attachment 323207 [details] Patch for EWS with test fix Converted UTF-8 strings in test file to Unicode escapes.
Attachment 323207 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:738: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:996: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:1135: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:1692: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/ScriptExtensions.txt:38: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/yarr/YarrPattern.cpp:1026: One space before end of line comments [whitespace/comments] [5] ERROR: Source/JavaScriptCore/yarr/YarrUnicodeProperties.cpp:72: Alphabetical sorting problem. [build/include_order] [4] ERROR: Source/JavaScriptCore/ucd/Scripts.txt:18: Line contains tab character. [whitespace/tab] [5] Total errors found: 8 in 32 files If any of these errors are false positives, please file a bug against check-webkit-style.
Committed r223081: <http://trac.webkit.org/changeset/223081>
The following tests were marked as passing, but are failing on the Release test262 bot: test262.yaml/test262/test/built-ins/RegExp/named-groups/unicode-property-names.js.default-strict: ERROR: Unexpected exit code: 3 test262.yaml/test262/test/built-ins/RegExp/named-groups/unicode-property-names.js.default: ERROR: Unexpected exit code: 3 test262.yaml/test262/test/built-ins/RegExp/named-groups/unicode-references.js.default-strict: ERROR: Unexpected exit code: 3 test262.yaml/test262/test/built-ins/RegExp/named-groups/unicode-references.js.default: ERROR: Unexpected exit code: 3 https://build.webkit.org/builders/Apple%20El%20Capitan%20Release%20Test262%20%28Tests%29/builds/4731
(In reply to Ryan Haddad from comment #22) > The following tests were marked as passing, but are failing on the Release > test262 bot: > > test262.yaml/test262/test/built-ins/RegExp/named-groups/unicode-property- > names.js.default-strict: ERROR: Unexpected exit code: 3 > test262.yaml/test262/test/built-ins/RegExp/named-groups/unicode-property- > names.js.default: ERROR: Unexpected exit code: 3 > test262.yaml/test262/test/built-ins/RegExp/named-groups/unicode-references. > js.default-strict: ERROR: Unexpected exit code: 3 > test262.yaml/test262/test/built-ins/RegExp/named-groups/unicode-references. > js.default: ERROR: Unexpected exit code: 3 > > https://build.webkit.org/builders/ > Apple%20El%20Capitan%20Release%20Test262%20%28Tests%29/builds/4731 Failing tests tracked in https://bugs.webkit.org/show_bug.cgi?id=178177