|Summary:||Simplify backtracking in YARR JIT|
|Product:||WebKit||Reporter:||Gavin Barraclough <barraclough>|
|Severity:||Normal||CC:||ggaren, msaboff, webkit.review.bot|
|Version:||528+ (Nightly build)|
|Bug Depends on:|
Description Gavin Barraclough 2011-05-15 16:44:26 PDT
YARR JIT currently performs a single pass of code generation over the pattern, with special handling to allow the code generation for some backtracking code out of line. We can simplify things by moving to a common mechanism whereby all forwards matching code is generated in one pass, and all backtracking code is generated in another. Backtracking code can be generated in reverse order, to optimized the common fall-through case. To make it easier to walk over the pattern, we can first convert to a more byte-code like format before JIT generating. In time we should unify this with the YARR interpreter to more closely unify the two.
Comment 2 WebKit Review Bot 2011-05-15 16:49:44 PDT
Comment 3 Gavin Barraclough 2011-05-15 17:26:28 PDT
Created attachment 93593 [details] Layout tests for expressions fixed by the prior patch.
Comment 4 Gavin Barraclough 2011-05-15 19:34:07 PDT
Fixing backtracking is blocking https://bugs.webkit.org/show_bug.cgi?id=60866.
Comment 5 Geoffrey Garen 2011-05-15 22:38:45 PDT
Please fix style bot issues, and remove this: +// breakpoint();
Comment 6 Geoffrey Garen 2011-05-15 23:04:00 PDT
There are a lot of great comments here, but also a lot of comments that describe the "what" and not the "why". I think you should remove comments that just describe the "what" (listed below). In theory, a "what" comment clutters the code, since it's redundant with what the code says. When a "what" comment does add something, that's probably a sign that the code it comments needs clearer names and abstractions. + break; // Nothing else to do! break back to the switch. + // Find the beginning of this set of alternatives. + // Check whether this set of alternatives are repeating or 'once through'. + // Update the input position. + // Link any input check failures from the prior alternative to this point. + // Calculate the difference in minimum size between the two alternatives, and + // update the input position. + // If the size of the pattern is not fixed, then we need to update the start index. + // Store the match start index, if we needed to & didn't do so above. + // If minimum size is zero, thenwe can store the current index directly. + // Increment position & check input for the first alternative - if sufficient + // input is available just to it, else jump to it's input check failure handler. + // Manage m_checked. + // Manage m_checked. + // For fixed count (always executed once) subpatterns, nothing to do here. + // Emit all alternatives, connect as a doubly linked list. + // Emit all alternatives, connect as a doubly linked list. + // If there are no repeating alternatives then emit an + // 'OpMatchFailed' to return no match. + // Run forwards over 'm_ops', generating the main matching path. + generate(); + // Run backwards over 'm_ops', generating code for backtracking. + backtrack();
Comment 7 Geoffrey Garen 2011-05-15 23:19:33 PDT
+ // The 'optimized' flag is used to null out the second pattern character, + // when two are fused to match a pair together. + bool m_optimized; How about "m_isDeadCode" or "m_didFuseWithPrevious" to make it more clear that this means the op can be removed because it fused with the previous op?
Comment 8 Gavin Barraclough 2011-05-15 23:28:13 PDT
Created attachment 93613 [details] Almost identical to the old patch, fix stylebot issues, remove 'what' comments, renamed m_optimized to m_isDeadCode, add backtrackTermDefault. Per a request over IRC by msaboff, have shared common implementations of term backtracking in backtrackTermDefault().
Comment 9 Michael Saboff 2011-05-16 00:02:18 PDT
Comment 10 Geoffrey Garen 2011-05-16 00:07:15 PDT
Comment on attachment 93613 [details] Almost identical to the old patch, fix stylebot issues, remove 'what' comments, renamed m_optimized to m_isDeadCode, add backtrackTermDefault. r=me, modulo Michael's comments above. Would be good to come up with a fix to the o(n^2) issue soon. These YarrOps really lend themselves to a future refactoring that makes them objects with virtual generate / backtrack functions.
Comment 11 Gavin Barraclough 2011-05-16 00:22:23 PDT
Fixed in r86536, will follow up with an investigation of alternatives to the O(N^2) code in backtracking.
Comment 12 Gavin Barraclough 2011-05-16 00:24:27 PDT
(This patch got about 1 1/2 hours of regex fuzzing before landing, and has no performance impact on sunspidey / v8).