WebKit Bugzilla
New
Browse
Search+
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
180874
YARR: JIT RegExps with back references
https://bugs.webkit.org/show_bug.cgi?id=180874
Summary
YARR: JIT RegExps with back references
Michael Saboff
Reported
2017-12-15 11:30:58 PST
Currently, patterns with back references, e.g. /(a)\1/, are not supported in the Yarr JIT. We should add support for some or all back references. It should be straightforward to support back references without case folding. JIT support for case folding with 8 bit characters shouldn't be to hard as the case folding table is a reasonable size. JIT support of 16-bit and/or Unicode patterns will require calling out to C++ for case folding.
Attachments
Patch
(26.38 KB, patch)
2018-08-31 14:55 PDT
,
Michael Saboff
no flags
Details
Formatted Diff
Diff
Updated pPatch with speculative build fixes
(28.16 KB, patch)
2018-09-01 09:03 PDT
,
Michael Saboff
fpizlo
: review+
Details
Formatted Diff
Diff
Show Obsolete
(1)
View All
Add attachment
proposed patch, testcase, etc.
Radar WebKit Bug Importer
Comment 1
2017-12-15 11:35:50 PST
<
rdar://problem/36078115
>
Michael Saboff
Comment 2
2018-08-31 14:55:44 PDT
Created
attachment 348676
[details]
Patch
Michael Saboff
Comment 3
2018-09-01 09:03:48 PDT
Created
attachment 348713
[details]
Updated pPatch with speculative build fixes
EWS Watchlist
Comment 4
2018-09-01 09:06:07 PDT
Attachment 348713
[details]
did not pass style-queue: ERROR: Source/JavaScriptCore/yarr/YarrJIT.cpp:3818: Multi line control clauses should use braces. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/runtime/RegExp.cpp:308: Multi line control clauses should use braces. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/runtime/RegExp.cpp:368: Multi line control clauses should use braces. [whitespace/braces] [4] Total errors found: 3 in 13 files If any of these errors are false positives, please file a bug against check-webkit-style.
Michael Saboff
Comment 5
2018-09-04 10:23:11 PDT
Comment on
attachment 348713
[details]
Updated pPatch with speculative build fixes View in context:
https://bugs.webkit.org/attachment.cgi?id=348713&action=review
> Source/JavaScriptCore/assembler/MacroAssemblerX86Common.h:1288 > +
I'll revert this whitespace change.
Michael Saboff
Comment 6
2018-09-04 14:19:00 PDT
Committed
r235636
: <
https://trac.webkit.org/changeset/235636
>
Saam Barati
Comment 7
2018-09-04 14:51:26 PDT
Comment on
attachment 348713
[details]
Updated pPatch with speculative build fixes View in context:
https://bugs.webkit.org/attachment.cgi?id=348713&action=review
LGTM too. It would be nice to add some more tests for some of the trickier tests in the JIT.
> Source/JavaScriptCore/yarr/YarrCanonicalizeUCS2.cpp:548 > + 0xb0, 0xb1, 0xb2, 0xb3, 0xb4, 0x39c, 0xb6, 0xb7, 0xb8, 0xb9, 0xba, 0xbb, 0xbc, 0xbd, 0xbe, 0xbf,
Any idea what 0x39c is?
> Source/JavaScriptCore/yarr/YarrCanonicalizeUCS2.cpp:552 > + 0xd0, 0xd1, 0xd2, 0xd3, 0xd4, 0xd5, 0xd6, 0xf7, 0xd8, 0xd9, 0xda, 0xdb, 0xdc, 0xdd, 0xde, 0x178
ditto about 0x178? Can you add tests for these?
> Source/JavaScriptCore/yarr/YarrJIT.cpp:1135 > + readCharacterDontDecodeSurrogates(0, patternCharacter, patternIndex); > + readCharacterDontDecodeSurrogates(m_checkedOffset - term->inputPosition, character);
Indentation is wrong.
> Source/JavaScriptCore/yarr/YarrJIT.cpp:1181 > + load32(Address(output, (subpatternId << 1) * sizeof(int)), patternIndex); > + load32(Address(output, ((subpatternId << 1) + 1) * sizeof(int)), patternTemp);
Why "<< 1" everywhere instead of "* 2"?
> Source/JavaScriptCore/yarr/YarrJIT.cpp:1263 > + load32(Address(output, (subpatternId << 1) * sizeof(int)), patternIndex); > + load32(Address(output, ((subpatternId << 1) + 1) * sizeof(int)), patternTemp);
Why not just use TimesFour for these addresses instead of "* sizeof(int)"? Wouldn't we end up with better instruction encoding?
Michael Saboff
Comment 8
2018-09-04 16:21:17 PDT
Comment on
attachment 348713
[details]
Updated pPatch with speculative build fixes View in context:
https://bugs.webkit.org/attachment.cgi?id=348713&action=review
>> Source/JavaScriptCore/yarr/YarrCanonicalizeUCS2.cpp:548 >> + 0xb0, 0xb1, 0xb2, 0xb3, 0xb4, 0x39c, 0xb6, 0xb7, 0xb8, 0xb9, 0xba, 0xbb, 0xbc, 0xbd, 0xbe, 0xbf, > > Any idea what 0x39c is?
That is the appropriate toUpper() result for 0xb5.
>> Source/JavaScriptCore/yarr/YarrCanonicalizeUCS2.cpp:552 >> + 0xd0, 0xd1, 0xd2, 0xd3, 0xd4, 0xd5, 0xd6, 0xf7, 0xd8, 0xd9, 0xda, 0xdb, 0xdc, 0xdd, 0xde, 0x178 > > ditto about 0x178? > > Can you add tests for these?
That is the correct toUpper() result for 0xff. I'll add tests for these two cases.
>> Source/JavaScriptCore/yarr/YarrJIT.cpp:1135 >> + readCharacterDontDecodeSurrogates(m_checkedOffset - term->inputPosition, character); > > Indentation is wrong.
I'll land a fix for this.
>> Source/JavaScriptCore/yarr/YarrJIT.cpp:1181 >> + load32(Address(output, ((subpatternId << 1) + 1) * sizeof(int)), patternTemp); > > Why "<< 1" everywhere instead of "* 2"?
Following the convention already in this file.
>> Source/JavaScriptCore/yarr/YarrJIT.cpp:1263 >> + load32(Address(output, ((subpatternId << 1) + 1) * sizeof(int)), patternTemp); > > Why not just use TimesFour for these addresses instead of "* sizeof(int)"? Wouldn't we end up with better instruction encoding?
The (subpatternId << 1) * sizeof(int)) becomes an offset of subpattern * 2 * 4. The instruction coding is fine (see last two instructions): 7:OpTerm BackReference pattern #1 {0,...} non-greedy ... 0x29db3e40904e: jmp 0x29db3e4090ba 0x29db3e409053: mov 0x8(%rcx), %r9d 0x29db3e409057: mov 0xc(%rcx), %r10d
Michael Saboff
Comment 9
2018-09-04 17:59:05 PDT
(In reply to Michael Saboff from
comment #8
)
> Comment on
attachment 348713
[details]
> >> Source/JavaScriptCore/yarr/YarrCanonicalizeUCS2.cpp:552 > >> + 0xd0, 0xd1, 0xd2, 0xd3, 0xd4, 0xd5, 0xd6, 0xf7, 0xd8, 0xd9, 0xda, 0xdb, 0xdc, 0xdd, 0xde, 0x178 > > > > ditto about 0x178? > > > > Can you add tests for these?
New tests tracked in
https://bugs.webkit.org/show_bug.cgi?id=189291
.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug