Bug 172069

Summary: Implement RegExp Unicode property escapes
Product: WebKit Reporter: Mathias Bynens <mathias>
Component: JavaScriptCoreAssignee: Michael Saboff <msaboff>
Status: RESOLVED FIXED    
Severity: Normal CC: buildbot, fpizlo, jfbastien, keith_miller, mark.lam, mathias, msaboff, rniwa, ryanhaddad, saam, webkit-bug-importer, ysuzuki
Priority: P2 Keywords: InRadar
Version: WebKit Nightly Build   
Hardware: Unspecified   
OS: Unspecified   
URL: https://github.com/tc39/proposal-regexp-unicode-property-escapes
See Also: https://github.com/Microsoft/ChakraCore/issues/2969
https://bugzilla.mozilla.org/show_bug.cgi?id=1361876
https://bugs.chromium.org/p/v8/issues/detail?id=4743
Attachments:
Description Flags
UCD Datefiles for Patch
none
Main Patch
jfbastien: review+
Combined patch for EWS
none
Patch for EWS with suggested changes and speculative build fixes.
none
Patch for EWS with updated build fixes
buildbot: commit-queue-
Archive of layout-test-results from ews100 for mac-elcapitan
none
Archive of layout-test-results from ews105 for mac-elcapitan-wk2
none
Patch for EWS with test fix none

Mathias Bynens
Reported 2017-05-13 02:42:15 PDT
Attachments
UCD Datefiles for Patch (3.84 MB, patch)
2017-10-06 15:11 PDT, Michael Saboff
no flags
Main Patch (399.79 KB, patch)
2017-10-06 15:12 PDT, Michael Saboff
jfbastien: review+
Combined patch for EWS (4.23 MB, patch)
2017-10-06 15:16 PDT, Michael Saboff
no flags
Patch for EWS with suggested changes and speculative build fixes. (4.23 MB, patch)
2017-10-06 17:21 PDT, Michael Saboff
no flags
Patch for EWS with updated build fixes (4.23 MB, patch)
2017-10-09 11:48 PDT, Michael Saboff
buildbot: commit-queue-
Archive of layout-test-results from ews100 for mac-elcapitan (991.82 KB, application/zip)
2017-10-09 13:09 PDT, Build Bot
no flags
Archive of layout-test-results from ews105 for mac-elcapitan-wk2 (1.12 MB, application/zip)
2017-10-09 13:13 PDT, Build Bot
no flags
Patch for EWS with test fix (4.23 MB, patch)
2017-10-09 13:19 PDT, Michael Saboff
no flags
Radar WebKit Bug Importer
Comment 1 2017-05-13 20:23:48 PDT
Michael Saboff
Comment 2 2017-09-27 06:43:09 PDT
Active Radar for the bug is <rdar://problem/33183184>
Michael Saboff
Comment 3 2017-10-06 15:11:54 PDT
Created attachment 323050 [details] UCD Datefiles for Patch These are Unicode Database version 10 files that are input for the main patch.
Michael Saboff
Comment 4 2017-10-06 15:12:15 PDT
Created attachment 323051 [details] Main Patch
Build Bot
Comment 5 2017-10-06 15:14:39 PDT
Attachment 323051 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/yarr/YarrUnicodeProperties.cpp:72: Alphabetical sorting problem. [build/include_order] [4] ERROR: Source/JavaScriptCore/yarr/YarrPattern.cpp:1026: One space before end of line comments [whitespace/comments] [5] Total errors found: 2 in 21 files If any of these errors are false positives, please file a bug against check-webkit-style.
Michael Saboff
Comment 6 2017-10-06 15:16:55 PDT
Created attachment 323053 [details] Combined patch for EWS Review the "Main Patch".
Build Bot
Comment 7 2017-10-06 15:19:18 PDT
Attachment 323053 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:738: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:996: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:1135: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:1692: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/ScriptExtensions.txt:38: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/yarr/YarrPattern.cpp:1026: One space before end of line comments [whitespace/comments] [5] ERROR: Source/JavaScriptCore/yarr/YarrUnicodeProperties.cpp:72: Alphabetical sorting problem. [build/include_order] [4] ERROR: Source/JavaScriptCore/ucd/Scripts.txt:18: Line contains tab character. [whitespace/tab] [5] Total errors found: 8 in 31 files If any of these errors are false positives, please file a bug against check-webkit-style.
Build Bot
Comment 8 2017-10-06 15:29:07 PDT
Attachment 323050 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:738: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:996: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:1135: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:1692: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/Scripts.txt:18: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/ScriptExtensions.txt:38: Line contains tab character. [whitespace/tab] [5] Total errors found: 6 in 10 files If any of these errors are false positives, please file a bug against check-webkit-style.
JF Bastien
Comment 9 2017-10-06 15:53:39 PDT
Comment on attachment 323051 [details] Main Patch View in context: https://bugs.webkit.org/attachment.cgi?id=323051&action=review r=me > Source/JavaScriptCore/ChangeLog:22 > + names to the function index. Using these hashing tables, we can lookup up a property up up up up > Source/JavaScriptCore/Scripts/generateYarrUnicodePropertyTables.py:1 > +#! /usr/bin/python #!/usr/bin/env python > Source/JavaScriptCore/Scripts/generateYarrUnicodePropertyTables.py:62 > +// DO NO EDIT! - This file was generated by generateYarrUnicodePropertyTables.py """ + __file__ + """ > Source/JavaScriptCore/Scripts/generateYarrUnicodePropertyTables.py:117 > + exit(1) This is all a lot of error handling for a python script! Such dedication, I'd just `with open(...) as ...:` and call it a day. :) > Source/JavaScriptCore/Scripts/hasher.py:1 > +#! /usr/bin/python Ditto on shebang. > Source/JavaScriptCore/Scripts/hasher.py:27 > +# Boston, MA 02110-1301, USA. Isn't this all newly translated code? > Source/JavaScriptCore/yarr/YarrInterpreter.cpp:307 > + size_t high = matches.size() - 1; Is it ever empty? > Source/JavaScriptCore/yarr/YarrInterpreter.cpp:336 > + size_t high = ranges.size() - 1; Ditto. > Source/JavaScriptCore/yarr/YarrInterpreter.cpp:354 > + const size_t thresholdForBinarySearch = 6; I guess it's never empty because of this! Did you choose 6 out of some scientific measurement? Or pulled the number from a Magical Place? > Source/JavaScriptCore/yarr/YarrParser.h:458 > + // m_err should be set tryConsumeUnicodePropertyExpression() I don't get this comment. "set by" ? > Source/JavaScriptCore/yarr/YarrPattern.cpp:98 > + addSortedMatchOrRange(min, max + 1); Can this ever overflow max? Ditto below for the arithmetic. > Source/JavaScriptCore/yarr/YarrPattern.cpp:418 > void atomBuiltInCharacterClass(BuiltInCharacterClassID classID, bool invert) I find calls to this hard to read because of the boolean param. Could you make it an enum class CharacterClassInvert { Do, Dont }; or something similar? > Source/JavaScriptCore/yarr/YarrPattern.cpp:1389 > + Extra > Source/JavaScriptCore/yarr/YarrPattern.h:50 > + } You could just set these values above, and then rely on the compiler auto-generating the default ctor properly. > Source/JavaScriptCore/yarr/YarrUnicodeProperties.cpp:52 > + ALWAYS_INLINE int entry(WTF::String key) const const WTF::String & ? > Source/JavaScriptCore/yarr/YarrUnicodeProperties.cpp:75 > +std::optional<BuiltInCharacterClassID> unicodeMatchPropertyValue(WTF::String unicodePropertyName, WTF::String unicodePropertyValue) const WTF::String & > Source/JavaScriptCore/yarr/YarrUnicodeProperties.cpp:92 > +std::optional<BuiltInCharacterClassID> unicodeMatchProperty(WTF::String unicodePropertyValue) const WTF::String &
Michael Saboff
Comment 10 2017-10-06 16:51:21 PDT
Comment on attachment 323051 [details] Main Patch View in context: https://bugs.webkit.org/attachment.cgi?id=323051&action=review >> Source/JavaScriptCore/ChangeLog:22 >> + names to the function index. Using these hashing tables, we can lookup up a property > > up up up up down with "up" - removed. >> Source/JavaScriptCore/Scripts/generateYarrUnicodePropertyTables.py:1 >> +#! /usr/bin/python > > #!/usr/bin/env python Fixed >> Source/JavaScriptCore/Scripts/generateYarrUnicodePropertyTables.py:62 >> +// DO NO EDIT! - This file was generated by generateYarrUnicodePropertyTables.py > > """ + __file__ + """ Fixed >> Source/JavaScriptCore/Scripts/generateYarrUnicodePropertyTables.py:117 >> + exit(1) > > This is all a lot of error handling for a python script! Such dedication, I'd just `with open(...) as ...:` and call it a day. :) I do this since there are several UCD files that must be present for this script to run. >> Source/JavaScriptCore/Scripts/hasher.py:1 >> +#! /usr/bin/python > > Ditto on shebang. Fixed >> Source/JavaScriptCore/Scripts/hasher.py:27 >> +# Boston, MA 02110-1301, USA. > > Isn't this all newly translated code? I translated it from the Perl and C++ code we already have. I believe that the license is required. >> Source/JavaScriptCore/yarr/YarrInterpreter.cpp:307 >> + size_t high = matches.size() - 1; > > Is it ever empty? No, see below. >> Source/JavaScriptCore/yarr/YarrInterpreter.cpp:354 >> + const size_t thresholdForBinarySearch = 6; > > I guess it's never empty because of this! > > Did you choose 6 out of some scientific measurement? Or pulled the number from a Magical Place? I tried various numbers, but didn't do rigorous testing before settling on 6. >> Source/JavaScriptCore/yarr/YarrParser.h:458 >> + // m_err should be set tryConsumeUnicodePropertyExpression() > > I don't get this comment. "set by" ? Yes, "set by". Since it only sets on error, I changed this to: "tryConsumeUnicodePropertyExpression() will set m_err for a malformed property expression " >> Source/JavaScriptCore/yarr/YarrPattern.cpp:98 >> + addSortedMatchOrRange(min, max + 1); > > Can this ever overflow max? Ditto below for the arithmetic. Not with in the foreseeable future range of Unicode characters. The current max is 0x10ffff, well below 2^32-1. >> Source/JavaScriptCore/yarr/YarrPattern.cpp:418 >> void atomBuiltInCharacterClass(BuiltInCharacterClassID classID, bool invert) > > I find calls to this hard to read because of the boolean param. Could you make it an enum class CharacterClassInvert { Do, Dont }; or something similar? This is a pre-existing pattern. It may be this way since this is a delegate function that is part of a template parameter class. >> Source/JavaScriptCore/yarr/YarrPattern.cpp:1389 >> + > > Extra Removed. >> Source/JavaScriptCore/yarr/YarrPattern.h:50 >> + } > > You could just set these values above, and then rely on the compiler auto-generating the default ctor properly. Done >> Source/JavaScriptCore/yarr/YarrUnicodeProperties.cpp:52 >> + ALWAYS_INLINE int entry(WTF::String key) const > > const WTF::String & ? Done >> Source/JavaScriptCore/yarr/YarrUnicodeProperties.cpp:75 >> +std::optional<BuiltInCharacterClassID> unicodeMatchPropertyValue(WTF::String unicodePropertyName, WTF::String unicodePropertyValue) > > const WTF::String & That breaks the compile as this is called with the result from another function.
Michael Saboff
Comment 11 2017-10-06 17:21:03 PDT
Created attachment 323072 [details] Patch for EWS with suggested changes and speculative build fixes.
Build Bot
Comment 12 2017-10-06 17:24:26 PDT
Attachment 323072 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:738: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:996: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:1135: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:1692: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/ScriptExtensions.txt:38: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/yarr/YarrPattern.cpp:1026: One space before end of line comments [whitespace/comments] [5] ERROR: Source/JavaScriptCore/yarr/YarrUnicodeProperties.cpp:72: Alphabetical sorting problem. [build/include_order] [4] ERROR: Source/JavaScriptCore/ucd/Scripts.txt:18: Line contains tab character. [whitespace/tab] [5] Total errors found: 8 in 32 files If any of these errors are false positives, please file a bug against check-webkit-style.
Michael Saboff
Comment 13 2017-10-09 11:48:44 PDT
Created attachment 323194 [details] Patch for EWS with updated build fixes
Build Bot
Comment 14 2017-10-09 11:52:16 PDT
Attachment 323194 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:738: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:996: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:1135: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:1692: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/ScriptExtensions.txt:38: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/yarr/YarrPattern.cpp:1026: One space before end of line comments [whitespace/comments] [5] ERROR: Source/JavaScriptCore/yarr/YarrUnicodeProperties.cpp:72: Alphabetical sorting problem. [build/include_order] [4] ERROR: Source/JavaScriptCore/ucd/Scripts.txt:18: Line contains tab character. [whitespace/tab] [5] Total errors found: 8 in 32 files If any of these errors are false positives, please file a bug against check-webkit-style.
Build Bot
Comment 15 2017-10-09 13:09:27 PDT
Comment on attachment 323194 [details] Patch for EWS with updated build fixes Attachment 323194 [details] did not pass mac-ews (mac): Output: http://webkit-queues.webkit.org/results/4802520 New failing tests: js/regexp-unicode-properties.html
Build Bot
Comment 16 2017-10-09 13:09:29 PDT
Created attachment 323202 [details] Archive of layout-test-results from ews100 for mac-elcapitan The attached test failures were seen while running run-webkit-tests on the mac-ews. Bot: ews100 Port: mac-elcapitan Platform: Mac OS X 10.11.6
Build Bot
Comment 17 2017-10-09 13:13:06 PDT
Comment on attachment 323194 [details] Patch for EWS with updated build fixes Attachment 323194 [details] did not pass mac-wk2-ews (mac-wk2): Output: http://webkit-queues.webkit.org/results/4802548 New failing tests: js/regexp-unicode-properties.html
Build Bot
Comment 18 2017-10-09 13:13:08 PDT
Created attachment 323206 [details] Archive of layout-test-results from ews105 for mac-elcapitan-wk2 The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews. Bot: ews105 Port: mac-elcapitan-wk2 Platform: Mac OS X 10.11.6
Michael Saboff
Comment 19 2017-10-09 13:19:47 PDT
Created attachment 323207 [details] Patch for EWS with test fix Converted UTF-8 strings in test file to Unicode escapes.
Build Bot
Comment 20 2017-10-09 13:22:18 PDT
Attachment 323207 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:738: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:996: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:1135: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/DerivedNormalizationProps.txt:1692: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/ucd/ScriptExtensions.txt:38: Line contains tab character. [whitespace/tab] [5] ERROR: Source/JavaScriptCore/yarr/YarrPattern.cpp:1026: One space before end of line comments [whitespace/comments] [5] ERROR: Source/JavaScriptCore/yarr/YarrUnicodeProperties.cpp:72: Alphabetical sorting problem. [build/include_order] [4] ERROR: Source/JavaScriptCore/ucd/Scripts.txt:18: Line contains tab character. [whitespace/tab] [5] Total errors found: 8 in 32 files If any of these errors are false positives, please file a bug against check-webkit-style.
Michael Saboff
Comment 21 2017-10-09 16:14:48 PDT
Ryan Haddad
Comment 22 2017-10-09 18:12:00 PDT
The following tests were marked as passing, but are failing on the Release test262 bot: test262.yaml/test262/test/built-ins/RegExp/named-groups/unicode-property-names.js.default-strict: ERROR: Unexpected exit code: 3 test262.yaml/test262/test/built-ins/RegExp/named-groups/unicode-property-names.js.default: ERROR: Unexpected exit code: 3 test262.yaml/test262/test/built-ins/RegExp/named-groups/unicode-references.js.default-strict: ERROR: Unexpected exit code: 3 test262.yaml/test262/test/built-ins/RegExp/named-groups/unicode-references.js.default: ERROR: Unexpected exit code: 3 https://build.webkit.org/builders/Apple%20El%20Capitan%20Release%20Test262%20%28Tests%29/builds/4731
Michael Saboff
Comment 23 2017-10-11 11:41:11 PDT
(In reply to Ryan Haddad from comment #22) > The following tests were marked as passing, but are failing on the Release > test262 bot: > > test262.yaml/test262/test/built-ins/RegExp/named-groups/unicode-property- > names.js.default-strict: ERROR: Unexpected exit code: 3 > test262.yaml/test262/test/built-ins/RegExp/named-groups/unicode-property- > names.js.default: ERROR: Unexpected exit code: 3 > test262.yaml/test262/test/built-ins/RegExp/named-groups/unicode-references. > js.default-strict: ERROR: Unexpected exit code: 3 > test262.yaml/test262/test/built-ins/RegExp/named-groups/unicode-references. > js.default: ERROR: Unexpected exit code: 3 > > https://build.webkit.org/builders/ > Apple%20El%20Capitan%20Release%20Test262%20%28Tests%29/builds/4731 Failing tests tracked in https://bugs.webkit.org/show_bug.cgi?id=178177
Note You need to log in before you can comment on or make changes to this bug.