Bug 60860

Summary: Simplify backtracking in YARR JIT
Product: WebKit Reporter: Gavin Barraclough <barraclough>
Component: JavaScriptCoreAssignee: Gavin Barraclough <barraclough>
Status: RESOLVED FIXED    
Severity: Normal CC: ggaren, msaboff, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on:    
Bug Blocks: 60866    
Attachments:
Description Flags
The patch
none
Layout tests for expressions fixed by the prior patch.
none
Almost identical to the old patch, fix stylebot issues, remove 'what' comments, renamed m_optimized to m_isDeadCode, add backtrackTermDefault. ggaren: review+

Gavin Barraclough
Reported 2011-05-15 16:44:26 PDT
YARR JIT currently performs a single pass of code generation over the pattern, with special handling to allow the code generation for some backtracking code out of line. We can simplify things by moving to a common mechanism whereby all forwards matching code is generated in one pass, and all backtracking code is generated in another. Backtracking code can be generated in reverse order, to optimized the common fall-through case. To make it easier to walk over the pattern, we can first convert to a more byte-code like format before JIT generating. In time we should unify this with the YARR interpreter to more closely unify the two.
Attachments
The patch (165.04 KB, patch)
2011-05-15 16:46 PDT, Gavin Barraclough
no flags
Layout tests for expressions fixed by the prior patch. (3.62 KB, patch)
2011-05-15 17:26 PDT, Gavin Barraclough
no flags
Almost identical to the old patch, fix stylebot issues, remove 'what' comments, renamed m_optimized to m_isDeadCode, add backtrackTermDefault. (166.83 KB, patch)
2011-05-15 23:28 PDT, Gavin Barraclough
ggaren: review+
Gavin Barraclough
Comment 1 2011-05-15 16:46:56 PDT
Created attachment 93592 [details] The patch
WebKit Review Bot
Comment 2 2011-05-15 16:49:44 PDT
Attachment 93592 [details] did not pass style-queue: Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1 Source/JavaScriptCore/yarr/YarrJIT.cpp:1503: Missing space before ( in while( [whitespace/parens] [5] Source/JavaScriptCore/yarr/YarrJIT.cpp:1581: Should have only a single space after a punctuation in a comment. [whitespace/comments] [5] Source/JavaScriptCore/yarr/YarrJIT.cpp:1660: Should have only a single space after a punctuation in a comment. [whitespace/comments] [5] Source/JavaScriptCore/yarr/YarrJIT.cpp:2002: Should have only a single space after a punctuation in a comment. [whitespace/comments] [5] Source/JavaScriptCore/yarr/YarrJIT.cpp:2026: Missing space before ( in while( [whitespace/parens] [5] Source/JavaScriptCore/yarr/YarrJIT.cpp:2399: Should have only a single space after a punctuation in a comment. [whitespace/comments] [5] Total errors found: 6 in 2 files If any of these errors are false positives, please file a bug against check-webkit-style.
Gavin Barraclough
Comment 3 2011-05-15 17:26:28 PDT
Created attachment 93593 [details] Layout tests for expressions fixed by the prior patch.
Gavin Barraclough
Comment 4 2011-05-15 19:34:07 PDT
Fixing backtracking is blocking https://bugs.webkit.org/show_bug.cgi?id=60866.
Geoffrey Garen
Comment 5 2011-05-15 22:38:45 PDT
Please fix style bot issues, and remove this: +// breakpoint();
Geoffrey Garen
Comment 6 2011-05-15 23:04:00 PDT
There are a lot of great comments here, but also a lot of comments that describe the "what" and not the "why". I think you should remove comments that just describe the "what" (listed below). In theory, a "what" comment clutters the code, since it's redundant with what the code says. When a "what" comment does add something, that's probably a sign that the code it comments needs clearer names and abstractions. + break; // Nothing else to do! break back to the switch. + // Find the beginning of this set of alternatives. + // Check whether this set of alternatives are repeating or 'once through'. + // Update the input position. + // Link any input check failures from the prior alternative to this point. + // Calculate the difference in minimum size between the two alternatives, and + // update the input position. + // If the size of the pattern is not fixed, then we need to update the start index. + // Store the match start index, if we needed to & didn't do so above. + // If minimum size is zero, thenwe can store the current index directly. + // Increment position & check input for the first alternative - if sufficient + // input is available just to it, else jump to it's input check failure handler. + // Manage m_checked. + // Manage m_checked. + // For fixed count (always executed once) subpatterns, nothing to do here. + // Emit all alternatives, connect as a doubly linked list. + // Emit all alternatives, connect as a doubly linked list. + // If there are no repeating alternatives then emit an + // 'OpMatchFailed' to return no match. + // Run forwards over 'm_ops', generating the main matching path. + generate(); + // Run backwards over 'm_ops', generating code for backtracking. + backtrack();
Geoffrey Garen
Comment 7 2011-05-15 23:19:33 PDT
+ // The 'optimized' flag is used to null out the second pattern character, + // when two are fused to match a pair together. + bool m_optimized; How about "m_isDeadCode" or "m_didFuseWithPrevious" to make it more clear that this means the op can be removed because it fused with the previous op?
Gavin Barraclough
Comment 8 2011-05-15 23:28:13 PDT
Created attachment 93613 [details] Almost identical to the old patch, fix stylebot issues, remove 'what' comments, renamed m_optimized to m_isDeadCode, add backtrackTermDefault. Per a request over IRC by msaboff, have shared common implementations of term backtracking in backtrackTermDefault().
Michael Saboff
Comment 9 2011-05-16 00:02:18 PDT
Comments from reviewing the first patch. Couldn't publish due to "midair" collision. View in context: https://bugs.webkit.org/attachment.cgi?id=93592&action=review > Source/JavaScriptCore/yarr/YarrJIT.cpp:354 > + // The operation, as a YarOpCode, and also a reference to the PatternTerm. Spelling of YarOpCode > Source/JavaScriptCore/yarr/YarrJIT.cpp:376 > + bool m_optimized; Too generic a name for the specific character pairing optimization. > Source/JavaScriptCore/yarr/YarrJIT.cpp:378 > + // Currentlyused in the case of some of the more complex management of Currently used > Source/JavaScriptCore/yarr/YarrJIT.cpp:384 > + // value that will be puched into the pattern's frame to return to, *pushed* > Source/JavaScriptCore/yarr/YarrJIT.cpp:390 > + // This class encapsualtes information about the state of code generation *encapsulates* > Source/JavaScriptCore/yarr/YarrJIT.cpp:642 > + YarrOp& nextOp = m_ops[opIndex + 1]; Added ASSERT(opIndex + 1 < m_ops.size()); before. > Source/JavaScriptCore/yarr/YarrJIT.cpp:1154 > + // In the case 'once through' expressions, the End node will laso have a reentry *also* > Source/JavaScriptCore/yarr/YarrJIT.cpp:1196 > + Remove extra blank line. > Source/JavaScriptCore/yarr/YarrJIT.cpp:1200 > + ditto > Source/JavaScriptCore/yarr/YarrJIT.cpp:1231 > + // we do not need to be able to backtrack back into any alternative other than remove extra space between able & to. > Source/JavaScriptCore/yarr/YarrJIT.cpp:1280 > + // alternatives bve responsible for checking that input is available (if spelling of "bve"? > Source/JavaScriptCore/yarr/YarrJIT.cpp:1759 > + // In the case of non-simple nested assertions we nedd to also link the *need* > Source/JavaScriptCore/yarr/YarrJIT.cpp:1782 > + // In the case of a single alterantive on its own, we don't need to *alternative* > Source/JavaScriptCore/yarr/YarrJIT.cpp:1784 > + // continue to backtrack out of the paretheses without jumping. *parentheses* > Source/JavaScriptCore/yarr/YarrJIT.cpp:1792 > + // If the alternative had adjusted the input possition we must link *position* > Source/JavaScriptCore/yarr/YarrJIT.cpp:1803 > + // The last of more than one alterantives must jump back to the begnning. *alternatives* > Source/JavaScriptCore/yarr/YarrJIT.cpp:1902 > + // If caputring, clear the capture (we only need to reset start). *capturing* > Source/JavaScriptCore/yarr/YarrJIT.cpp:1929 > + // Check whether we should backtrack back into the paretheses, or if we *parentheses* > Source/JavaScriptCore/yarr/YarrJIT.cpp:1972 > + // We should never be backtracking to here (hence the 'terminal' in the name). remove extra space between backtracking and to
Geoffrey Garen
Comment 10 2011-05-16 00:07:15 PDT
Comment on attachment 93613 [details] Almost identical to the old patch, fix stylebot issues, remove 'what' comments, renamed m_optimized to m_isDeadCode, add backtrackTermDefault. r=me, modulo Michael's comments above. Would be good to come up with a fix to the o(n^2) issue soon. These YarrOps really lend themselves to a future refactoring that makes them objects with virtual generate / backtrack functions.
Gavin Barraclough
Comment 11 2011-05-16 00:22:23 PDT
Fixed in r86536, will follow up with an investigation of alternatives to the O(N^2) code in backtracking.
Gavin Barraclough
Comment 12 2011-05-16 00:24:27 PDT
(This patch got about 1 1/2 hours of regex fuzzing before landing, and has no performance impact on sunspidey / v8).
Note You need to log in before you can comment on or make changes to this bug.