Summary: | Regexp JIT'ed Code Contains Branches to Branches for Backtracking | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | WebKit | Reporter: | Michael Saboff <msaboff> | ||||||||||
Component: | JavaScriptCore | Assignee: | Michael Saboff <msaboff> | ||||||||||
Status: | RESOLVED FIXED | ||||||||||||
Severity: | Enhancement | CC: | barraclough, commit-queue, loki, ossy | ||||||||||
Priority: | P2 | ||||||||||||
Version: | 528+ (Nightly build) | ||||||||||||
Hardware: | All | ||||||||||||
OS: | All | ||||||||||||
Attachments: |
|
Description
Michael Saboff
2010-11-30 16:47:43 PST
Created attachment 75232 [details]
Patch to remove most paren related branches to branches
This patch propagates branches from within parentheses backtracking code to later alternatives. It also emits the parentheses failure code at the end of the routine. It handles indirect branches as backtracking destinations and emits a trampoline to an indirect branch when needed.
Some cleanups were done as part of this patch. Specifically the invertOrCapture flag was separated into two.
The code generated by this patch for the same regular expression in the description is:
push %rbp
mov %rsp,%rbp
push %rbx
mov %esi,(%rcx)
sub $0x10,%rsp
add $0x2,%esi
cmp %edx,%esi
ja nextChar
loop:
mov %rsi,%rax
add $0xfffffffffffffffe,%eax
mov %eax,0x8(%rcx)
cmpw $0x61,-0x4(%rdi,%rsi,2)
jne tryAlt1A
mov $tryAlt1A,%r11
mov %r11,(%rsp)
jmpq term1Match
tryAlt1A:
cmpw $0x41,-0x4(%rdi,%rsi,2)
jne term1Fail
mov $term1Fail,%r11
mov %r11,(%rsp)
jmpq term1Match
term1Match:
mov %rsi,%rax
add $0xffffffffffffffff,%eax
mov %eax,0xc(%rcx)
mov %rsi,%rax
add $0xffffffffffffffff,%eax
mov %eax,0x10(%rcx)
cmpw $0x62,-0x2(%rdi,%rsi,2)
jne tryAlt2B
mov $tryAlt2B,%r11
mov %r11,0x8(%rsp)
jmpq term2Match
tryAlt2B:
cmpw $0x42,-0x2(%rdi,%rsi,2)
jne term2Fail
mov $term2Fail,%r11
mov %r11,0x8(%rsp)
jmpq term2Match
term2Match:
mov %esi,0x14(%rcx)
add $0x10,%rsp
mov (%rcx),%eax
mov %esi,0x4(%rcx)
pop %rbx
pop %rbp
retq
nextChar:
mov %rsi,%rax
sub $0x1,%eax
mov %eax,(%rcx)
add $0x1,%esi
cmp %edx,%esi
jbe loop:
add $0x10,%rsp
mov $0xffffffff,%eax
pop %rbx
pop %rbp
retq
term2Fail:
movl $0xffffffff,0x10(%rcx)
jmpq *(%rsp)
term1Fail:
movl $0xffffffff,0x8(%rcx)
jmpq nextChar
Comment on attachment 75232 [details] Patch to remove most paren related branches to branches View in context: https://bugs.webkit.org/attachment.cgi?id=75232&action=review > JavaScriptCore/yarr/RegexJIT.cpp:295 > + struct OffsetHashTraits : GenericHashTraits<int> { > + static const bool emptyValueIsZero = false; > + static const bool needsDestruction = false; > + static int emptyValue() { return -1; } > + }; As discussed in person, I think it would be better to store the stack offset as a uint32_t and use UnsignedWithZeroKeyHashTraits<uint32_t> from HashTraits.h instead. Created attachment 75323 [details]
Updated Proposed Patch
This patch addresses the concern raised by the prior patch and now uses UnsignedWithZeroKeyHashTraits<uint32_t>.
Fixed the jump to next instruction in the success case. Fixed a bug in the updated fast/regex/pcre tests.
Added ChangeLogs.
Comment on attachment 75323 [details]
Updated Proposed Patch
working on a bug. Will post an updated patch when available.
Hi Michael, The patch generally looks fine, few small issues. (1) You use the variable name 'reGen', but our style guidelines do not allow abbreviations. I suggest renaming to 'generator'. (2) You add a couple of methods with names prefixed 'have' to BacktrackDestination – throughout JSC we tend to use 'has' for these kinds of method names. Though objectively equivalent to each other, for consistency I think we should stick to 'has'. (3) You add a method used() to DataLabelPtr – in the long run, I'm not sure this is safe, so I'd rather not take this step at this point. From a security perspective, we'll want to harder the assembler against integer overflow. I think logically in the assembler we want all 32bit values to be valid (otherwise converting an offset to a label would require a guard). So we should not be stealing bits from these offsets. I'm aware that in label we already do so, but I think we should consider that a bug to be fixed, rather than a design pattern to copy. (On the other hand I'm fine with exposing the isSet() method from the assembler). Rather than stealing bits from DataLabelPtr, there are bits you could steal from within BacktrackDestination – BacktrackType doesn't need to be using a full 32-bits, you could steal from here, and this would reduce the size of the patch. (4) Also in BacktrackDestination - setDataLabelPtr. I think this method could do with a slightly more descriptive name, that describes more why you are calling it rather than what it does. The DataLabelPtr is recording a store instruction that we be be repatched to write the address to be jumped to to resume this disjunction when matching, right? Errk, not sure how succinct and descriptive a name we can come up with for this, I think setBacktrackStoreLocation would be an improvement, feel free to do better. (5) Finally, in these cases the extra braces around the else statements should be removed. 996 } else { 997 m_backtrack.setBacktrackJumpList(&jumpsToNext); 998 } 999 } else { 1000 nextBacktrackFallThrough = false; 1001 } 1504 } else if (state.isLastAlternative()) { 1505 propogateBacktrack = true; 1506 } cheers, G. Created attachment 75425 [details]
Updated patch addressing reviewers concerns and fixing a bug.
Fixed a bug where non fixed sized paren backtrack handling was going to the outside of the paren code instead of the inside the paren code and a related issue were some nested paren handling wasn't saving the right backtrack address on the stack. Added test cases for this fix.
Addressed the concerns raised from the prior patch. For point 3, changed the dataLabelPtr.isUsed() to .isSet(), which is more in keeping with the logic. For point 4, changed BacktrackDestination::setDataLabelPtr to setBacktrackDataLabel.
Comment on attachment 75425 [details]
Updated patch addressing reviewers concerns and fixing a bug.
Looks great Michael.
Comment on attachment 75425 [details] Updated patch addressing reviewers concerns and fixing a bug. Rejecting patch 75425 from commit-queue. Failed to run "['./WebKitTools/Scripts/webkit-patch', '--status-host=queues.webkit.org', '--bot-id=abarth-cq-sl', 'apply-attachment', '--force-clean', '--non-interactive', 75425]" exit_code: 2 Last 500 characters of output: et -15 lines). patching file JavaScriptCore/yarr/RegexPattern.h Hunk #5 succeeded at 166 with fuzz 2. patching file LayoutTests/ChangeLog Hunk #1 succeeded at 1 with fuzz 3. patching file LayoutTests/fast/regex/parentheses-expected.txt patching file LayoutTests/fast/regex/parentheses.html patching file LayoutTests/fast/regex/script-tests/parentheses.js Failed to run "[u'/Users/abarth/git/webkit-queue/WebKitTools/Scripts/svn-apply', u'--reviewer', u'Gavin Barraclough', u'--force']" exit_code: 1 Full output: http://queues.webkit.org/results/6804018 Created attachment 75507 [details]
Patch with updates for latest top of trunk.
Comment on attachment 75507 [details] Patch with updates for latest top of trunk. Clearing flags on attachment: 75507 Committed r73307: <http://trac.webkit.org/changeset/73307> All reviewed patches have been landed. Closing bug. |