Currently, patterns with back references, e.g. /(a)\1/, are not supported in the Yarr JIT. We should add support for some or all back references. It should be straightforward to support back references without case folding. JIT support for case folding with 8 bit characters shouldn't be to hard as the case folding table is a reasonable size. JIT support of 16-bit and/or Unicode patterns will require calling out to C++ for case folding.
<rdar://problem/36078115>
Created attachment 348676 [details] Patch
Created attachment 348713 [details] Updated pPatch with speculative build fixes
Attachment 348713 [details] did not pass style-queue: ERROR: Source/JavaScriptCore/yarr/YarrJIT.cpp:3818: Multi line control clauses should use braces. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/runtime/RegExp.cpp:308: Multi line control clauses should use braces. [whitespace/braces] [4] ERROR: Source/JavaScriptCore/runtime/RegExp.cpp:368: Multi line control clauses should use braces. [whitespace/braces] [4] Total errors found: 3 in 13 files If any of these errors are false positives, please file a bug against check-webkit-style.
Comment on attachment 348713 [details] Updated pPatch with speculative build fixes View in context: https://bugs.webkit.org/attachment.cgi?id=348713&action=review > Source/JavaScriptCore/assembler/MacroAssemblerX86Common.h:1288 > + I'll revert this whitespace change.
Committed r235636: <https://trac.webkit.org/changeset/235636>
Comment on attachment 348713 [details] Updated pPatch with speculative build fixes View in context: https://bugs.webkit.org/attachment.cgi?id=348713&action=review LGTM too. It would be nice to add some more tests for some of the trickier tests in the JIT. > Source/JavaScriptCore/yarr/YarrCanonicalizeUCS2.cpp:548 > + 0xb0, 0xb1, 0xb2, 0xb3, 0xb4, 0x39c, 0xb6, 0xb7, 0xb8, 0xb9, 0xba, 0xbb, 0xbc, 0xbd, 0xbe, 0xbf, Any idea what 0x39c is? > Source/JavaScriptCore/yarr/YarrCanonicalizeUCS2.cpp:552 > + 0xd0, 0xd1, 0xd2, 0xd3, 0xd4, 0xd5, 0xd6, 0xf7, 0xd8, 0xd9, 0xda, 0xdb, 0xdc, 0xdd, 0xde, 0x178 ditto about 0x178? Can you add tests for these? > Source/JavaScriptCore/yarr/YarrJIT.cpp:1135 > + readCharacterDontDecodeSurrogates(0, patternCharacter, patternIndex); > + readCharacterDontDecodeSurrogates(m_checkedOffset - term->inputPosition, character); Indentation is wrong. > Source/JavaScriptCore/yarr/YarrJIT.cpp:1181 > + load32(Address(output, (subpatternId << 1) * sizeof(int)), patternIndex); > + load32(Address(output, ((subpatternId << 1) + 1) * sizeof(int)), patternTemp); Why "<< 1" everywhere instead of "* 2"? > Source/JavaScriptCore/yarr/YarrJIT.cpp:1263 > + load32(Address(output, (subpatternId << 1) * sizeof(int)), patternIndex); > + load32(Address(output, ((subpatternId << 1) + 1) * sizeof(int)), patternTemp); Why not just use TimesFour for these addresses instead of "* sizeof(int)"? Wouldn't we end up with better instruction encoding?
Comment on attachment 348713 [details] Updated pPatch with speculative build fixes View in context: https://bugs.webkit.org/attachment.cgi?id=348713&action=review >> Source/JavaScriptCore/yarr/YarrCanonicalizeUCS2.cpp:548 >> + 0xb0, 0xb1, 0xb2, 0xb3, 0xb4, 0x39c, 0xb6, 0xb7, 0xb8, 0xb9, 0xba, 0xbb, 0xbc, 0xbd, 0xbe, 0xbf, > > Any idea what 0x39c is? That is the appropriate toUpper() result for 0xb5. >> Source/JavaScriptCore/yarr/YarrCanonicalizeUCS2.cpp:552 >> + 0xd0, 0xd1, 0xd2, 0xd3, 0xd4, 0xd5, 0xd6, 0xf7, 0xd8, 0xd9, 0xda, 0xdb, 0xdc, 0xdd, 0xde, 0x178 > > ditto about 0x178? > > Can you add tests for these? That is the correct toUpper() result for 0xff. I'll add tests for these two cases. >> Source/JavaScriptCore/yarr/YarrJIT.cpp:1135 >> + readCharacterDontDecodeSurrogates(m_checkedOffset - term->inputPosition, character); > > Indentation is wrong. I'll land a fix for this. >> Source/JavaScriptCore/yarr/YarrJIT.cpp:1181 >> + load32(Address(output, ((subpatternId << 1) + 1) * sizeof(int)), patternTemp); > > Why "<< 1" everywhere instead of "* 2"? Following the convention already in this file. >> Source/JavaScriptCore/yarr/YarrJIT.cpp:1263 >> + load32(Address(output, ((subpatternId << 1) + 1) * sizeof(int)), patternTemp); > > Why not just use TimesFour for these addresses instead of "* sizeof(int)"? Wouldn't we end up with better instruction encoding? The (subpatternId << 1) * sizeof(int)) becomes an offset of subpattern * 2 * 4. The instruction coding is fine (see last two instructions): 7:OpTerm BackReference pattern #1 {0,...} non-greedy ... 0x29db3e40904e: jmp 0x29db3e4090ba 0x29db3e409053: mov 0x8(%rcx), %r9d 0x29db3e409057: mov 0xc(%rcx), %r10d
(In reply to Michael Saboff from comment #8) > Comment on attachment 348713 [details] > >> Source/JavaScriptCore/yarr/YarrCanonicalizeUCS2.cpp:552 > >> + 0xd0, 0xd1, 0xd2, 0xd3, 0xd4, 0xd5, 0xd6, 0xf7, 0xd8, 0xd9, 0xda, 0xdb, 0xdc, 0xdd, 0xde, 0x178 > > > > ditto about 0x178? > > > > Can you add tests for these? New tests tracked in https://bugs.webkit.org/show_bug.cgi?id=189291.