Bug 180538 - YARR: JIT RegExps with greedy parenthesized sub patterns
Summary: YARR: JIT RegExps with greedy parenthesized sub patterns
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: Other
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Michael Saboff
URL:
Keywords: InRadar
Depends on:
Blocks: 179230
  Show dependency treegraph
 
Reported: 2017-12-07 11:38 PST by Michael Saboff
Modified: 2017-12-11 14:46 PST (History)
9 users (show)

See Also:


Attachments
Patch (79.93 KB, patch)
2017-12-07 16:54 PST, Michael Saboff
ews-watchlist: commit-queue-
Details | Formatted Diff | Diff
Archive of layout-test-results from ews114 for mac-elcapitan (3.06 MB, application/zip)
2017-12-07 18:19 PST, EWS Watchlist
no flags Details
Fixed test failure. (80.00 KB, patch)
2017-12-07 19:37 PST, Michael Saboff
jfbastien: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Saboff 2017-12-07 11:38:21 PST
We currently do not JIT code for patterns like /(a)*(b)*/.  Instead we handle those in the YARR interpreter which is much slower.
Comment 1 Radar WebKit Bug Importer 2017-12-07 11:39:33 PST
<rdar://problem/35914716>
Comment 2 Michael Saboff 2017-12-07 16:54:24 PST
Created attachment 328755 [details]
Patch
Comment 3 EWS Watchlist 2017-12-07 18:19:39 PST
Comment on attachment 328755 [details]
Patch

Attachment 328755 [details] did not pass mac-debug-ews (mac):
Output: http://webkit-queues.webkit.org/results/5536664

New failing tests:
js/regexp-zero-length-alternatives.html
Comment 4 EWS Watchlist 2017-12-07 18:19:40 PST
Created attachment 328769 [details]
Archive of layout-test-results from ews114 for mac-elcapitan

The attached test failures were seen while running run-webkit-tests on the mac-debug-ews.
Bot: ews114  Port: mac-elcapitan  Platform: Mac OS X 10.11.6
Comment 5 Michael Saboff 2017-12-07 19:37:16 PST
Created attachment 328782 [details]
Fixed test failure.

The test failure was due a JIT'ed ASSERT emitted for Debug builds.  It caught the case where we didn't restore the parenthesis start index when backtracking.
Comment 6 JF Bastien 2017-12-08 10:08:32 PST
Comment on attachment 328782 [details]
Fixed test failure.

View in context: https://bugs.webkit.org/attachment.cgi?id=328782&action=review

r=me

> Source/JavaScriptCore/testRegExp.cpp:438
> +        const char* regexpError;

Seems less error-prone to set to nullptr here, instead of in parseRegExpLine.

> Source/JavaScriptCore/yarr/YarrJIT.cpp:187
> +            return sizeof(ParenContext) + sizeof(Subpatterns) * parenContextSizes.numSubpatterns() + sizeof(uintptr_t) * parenContextSizes.frameSlots();

Do we cap these enough so that they can't overflow? We deal with 32-bit values above, so even if this is size_t (because of sizeof) I can't convince myself we can't overflow.

> Source/JavaScriptCore/yarr/YarrJIT.cpp:263
> +            for (unsigned subpattern = firstSubpattern; subpattern <= lastSubpattern; subpattern++) {

subpattern <= lastSubpattern range seems odd! I guess it's correct since it's used below as well?

> Source/JavaScriptCore/yarr/YarrJIT.cpp:283
> +            for (unsigned subpattern = firstSubpattern; subpattern <= lastSubpattern; subpattern++) {

(here)

> Source/JavaScriptCore/yarr/YarrJIT.cpp:289
> +        for (unsigned frameLocation = subpatternBaseFrameLocation; frameLocation < m_parenContextSizes.frameSlots(); frameLocation++) {

Versus closed range here :-)

> Source/JavaScriptCore/yarr/YarrJIT.cpp:2218
> +                ASSERT_NOT_REACHED();

RELEASE_ ?

> Source/JavaScriptCore/yarr/YarrJIT.cpp:2276
> +                ASSERT_NOT_REACHED();

RELEASE_ ?

> Source/JavaScriptCore/yarr/YarrJIT.cpp:3421
> +            move(TrustedImm32(1000000), remainingMatchCount);

matchLimit from Yarr.h instead?

> Source/JavaScriptCore/yarr/YarrJIT.h:43
> +const size_t patternContextBufferSize = 8192; // Space caller allocates to save nested parenthesis context

I would make this constexpr so it clearly doesn't change, since you use it in a variable-sized array.
Comment 7 Michael Saboff 2017-12-08 12:26:14 PST
Comment on attachment 328782 [details]
Fixed test failure.

View in context: https://bugs.webkit.org/attachment.cgi?id=328782&action=review

>> Source/JavaScriptCore/testRegExp.cpp:438
>> +        const char* regexpError;
> 
> Seems less error-prone to set to nullptr here, instead of in parseRegExpLine.

Fixed.

>> Source/JavaScriptCore/yarr/YarrJIT.cpp:187
>> +            return sizeof(ParenContext) + sizeof(Subpatterns) * parenContextSizes.numSubpatterns() + sizeof(uintptr_t) * parenContextSizes.frameSlots();
> 
> Do we cap these enough so that they can't overflow? We deal with 32-bit values above, so even if this is size_t (because of sizeof) I can't convince myself we can't overflow.

I'll add a check where we create the free list from sizeOf() and fall back to the interpreter on unreasonable values.

>> Source/JavaScriptCore/yarr/YarrJIT.cpp:263
>> +            for (unsigned subpattern = firstSubpattern; subpattern <= lastSubpattern; subpattern++) {
> 
> subpattern <= lastSubpattern range seems odd! I guess it's correct since it's used below as well?

lastSubpattern is an inclusive value.  It is the last allocated subpattern.

>> Source/JavaScriptCore/yarr/YarrJIT.cpp:289
>> +        for (unsigned frameLocation = subpatternBaseFrameLocation; frameLocation < m_parenContextSizes.frameSlots(); frameLocation++) {
> 
> Versus closed range here :-)

frameSlots() is the number of frame slots, not the last one.

>> Source/JavaScriptCore/yarr/YarrJIT.cpp:2218
>> +                ASSERT_NOT_REACHED();
> 
> RELEASE_ ?

Changed.  This is a defensive check, since these node types are not emitted when JIT_ALL_PARENS_EXPRESSIONS is not defined.

>> Source/JavaScriptCore/yarr/YarrJIT.cpp:2276
>> +                ASSERT_NOT_REACHED();
> 
> RELEASE_ ?

Ditto.

>> Source/JavaScriptCore/yarr/YarrJIT.cpp:3421
>> +            move(TrustedImm32(1000000), remainingMatchCount);
> 
> matchLimit from Yarr.h instead?

Done

>> Source/JavaScriptCore/yarr/YarrJIT.h:43
>> +const size_t patternContextBufferSize = 8192; // Space caller allocates to save nested parenthesis context
> 
> I would make this constexpr so it clearly doesn't change, since you use it in a variable-sized array.

Done
Comment 8 Michael Saboff 2017-12-08 12:32:44 PST
Committed r225695: <https://trac.webkit.org/changeset/225695>
Comment 9 Yusuke Suzuki 2017-12-09 11:43:13 PST
I think recent Yarr changes improve Speedometer/Ember! Last time I checked Ember, significant time is consumed in Yarr Interpreter. It meant Ember.js includes some RegExps which are not compiled in Yarr JIT.
Comment 10 Matt Lewis 2017-12-11 14:46:42 PST
This caused a break in the High Sierra 32-bit JSC build.

https://build.webkit.org/builders/Apple%20High%20Sierra%2032-bit%20JSC%20%28BuildAndTest%29/builds/671
https://build.webkit.org/builders/Apple%20High%20Sierra%2032-bit%20JSC%20%28BuildAndTest%29/builds/671/steps/webkit-32bit-jsc-test/logs/stdio

failure:
ChakraCore.yaml/ChakraCore/test/es6/regex-unicode.js.default

ChakraCore.yaml/ChakraCore/test/es6/regex-unicode.js.default: Detected failures:
ChakraCore.yaml/ChakraCore/test/es6/regex-unicode.js.default: *** Running test #4 (3): A character set containing a range spanning multiple planes should match characters from all those planes
ChakraCore.yaml/ChakraCore/test/es6/regex-unicode.js.default: Test threw exception: assert.isTrue failed: expected: true actual: false: First character in plane #1
ChakraCore.yaml/ChakraCore/test/es6/regex-unicode.js.default: FAILED
ChakraCore.yaml/ChakraCore/test/es6/regex-unicode.js.default: Summary of tests: total executed: 7; passed: 6; failed: 1
FAIL: ChakraCore.yaml/ChakraCore/test/es6/regex-unicode.js.default