Summary: | Simplify backtracking in YARR JIT | ||
---|---|---|---|
Product: | WebKit | Reporter: | Gavin Barraclough <barraclough> |
Component: | JavaScriptCore | Assignee: | Gavin Barraclough <barraclough> |
Status: | RESOLVED FIXED | ||
Severity: | Normal | CC: | ggaren, msaboff, webkit.review.bot |
Priority: | P2 | ||
Version: | 528+ (Nightly build) | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Bug Depends on: | |||
Bug Blocks: | 60866 | ||
Attachments: |
Description
Gavin Barraclough
2011-05-15 16:44:26 PDT
Created attachment 93592 [details]
The patch
Attachment 93592 [details] did not pass style-queue:
Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1
Source/JavaScriptCore/yarr/YarrJIT.cpp:1503: Missing space before ( in while( [whitespace/parens] [5]
Source/JavaScriptCore/yarr/YarrJIT.cpp:1581: Should have only a single space after a punctuation in a comment. [whitespace/comments] [5]
Source/JavaScriptCore/yarr/YarrJIT.cpp:1660: Should have only a single space after a punctuation in a comment. [whitespace/comments] [5]
Source/JavaScriptCore/yarr/YarrJIT.cpp:2002: Should have only a single space after a punctuation in a comment. [whitespace/comments] [5]
Source/JavaScriptCore/yarr/YarrJIT.cpp:2026: Missing space before ( in while( [whitespace/parens] [5]
Source/JavaScriptCore/yarr/YarrJIT.cpp:2399: Should have only a single space after a punctuation in a comment. [whitespace/comments] [5]
Total errors found: 6 in 2 files
If any of these errors are false positives, please file a bug against check-webkit-style.
Created attachment 93593 [details]
Layout tests for expressions fixed by the prior patch.
Fixing backtracking is blocking https://bugs.webkit.org/show_bug.cgi?id=60866. Please fix style bot issues, and remove this: +// breakpoint(); There are a lot of great comments here, but also a lot of comments that describe the "what" and not the "why". I think you should remove comments that just describe the "what" (listed below). In theory, a "what" comment clutters the code, since it's redundant with what the code says. When a "what" comment does add something, that's probably a sign that the code it comments needs clearer names and abstractions. + break; // Nothing else to do! break back to the switch. + // Find the beginning of this set of alternatives. + // Check whether this set of alternatives are repeating or 'once through'. + // Update the input position. + // Link any input check failures from the prior alternative to this point. + // Calculate the difference in minimum size between the two alternatives, and + // update the input position. + // If the size of the pattern is not fixed, then we need to update the start index. + // Store the match start index, if we needed to & didn't do so above. + // If minimum size is zero, thenwe can store the current index directly. + // Increment position & check input for the first alternative - if sufficient + // input is available just to it, else jump to it's input check failure handler. + // Manage m_checked. + // Manage m_checked. + // For fixed count (always executed once) subpatterns, nothing to do here. + // Emit all alternatives, connect as a doubly linked list. + // Emit all alternatives, connect as a doubly linked list. + // If there are no repeating alternatives then emit an + // 'OpMatchFailed' to return no match. + // Run forwards over 'm_ops', generating the main matching path. + generate(); + // Run backwards over 'm_ops', generating code for backtracking. + backtrack(); + // The 'optimized' flag is used to null out the second pattern character, + // when two are fused to match a pair together. + bool m_optimized; How about "m_isDeadCode" or "m_didFuseWithPrevious" to make it more clear that this means the op can be removed because it fused with the previous op? Created attachment 93613 [details]
Almost identical to the old patch, fix stylebot issues, remove 'what' comments, renamed m_optimized to m_isDeadCode, add backtrackTermDefault.
Per a request over IRC by msaboff, have shared common implementations of term backtracking in backtrackTermDefault().
Comments from reviewing the first patch. Couldn't publish due to "midair" collision. View in context: https://bugs.webkit.org/attachment.cgi?id=93592&action=review > Source/JavaScriptCore/yarr/YarrJIT.cpp:354 > + // The operation, as a YarOpCode, and also a reference to the PatternTerm. Spelling of YarOpCode > Source/JavaScriptCore/yarr/YarrJIT.cpp:376 > + bool m_optimized; Too generic a name for the specific character pairing optimization. > Source/JavaScriptCore/yarr/YarrJIT.cpp:378 > + // Currentlyused in the case of some of the more complex management of Currently used > Source/JavaScriptCore/yarr/YarrJIT.cpp:384 > + // value that will be puched into the pattern's frame to return to, *pushed* > Source/JavaScriptCore/yarr/YarrJIT.cpp:390 > + // This class encapsualtes information about the state of code generation *encapsulates* > Source/JavaScriptCore/yarr/YarrJIT.cpp:642 > + YarrOp& nextOp = m_ops[opIndex + 1]; Added ASSERT(opIndex + 1 < m_ops.size()); before. > Source/JavaScriptCore/yarr/YarrJIT.cpp:1154 > + // In the case 'once through' expressions, the End node will laso have a reentry *also* > Source/JavaScriptCore/yarr/YarrJIT.cpp:1196 > + Remove extra blank line. > Source/JavaScriptCore/yarr/YarrJIT.cpp:1200 > + ditto > Source/JavaScriptCore/yarr/YarrJIT.cpp:1231 > + // we do not need to be able to backtrack back into any alternative other than remove extra space between able & to. > Source/JavaScriptCore/yarr/YarrJIT.cpp:1280 > + // alternatives bve responsible for checking that input is available (if spelling of "bve"? > Source/JavaScriptCore/yarr/YarrJIT.cpp:1759 > + // In the case of non-simple nested assertions we nedd to also link the *need* > Source/JavaScriptCore/yarr/YarrJIT.cpp:1782 > + // In the case of a single alterantive on its own, we don't need to *alternative* > Source/JavaScriptCore/yarr/YarrJIT.cpp:1784 > + // continue to backtrack out of the paretheses without jumping. *parentheses* > Source/JavaScriptCore/yarr/YarrJIT.cpp:1792 > + // If the alternative had adjusted the input possition we must link *position* > Source/JavaScriptCore/yarr/YarrJIT.cpp:1803 > + // The last of more than one alterantives must jump back to the begnning. *alternatives* > Source/JavaScriptCore/yarr/YarrJIT.cpp:1902 > + // If caputring, clear the capture (we only need to reset start). *capturing* > Source/JavaScriptCore/yarr/YarrJIT.cpp:1929 > + // Check whether we should backtrack back into the paretheses, or if we *parentheses* > Source/JavaScriptCore/yarr/YarrJIT.cpp:1972 > + // We should never be backtracking to here (hence the 'terminal' in the name). remove extra space between backtracking and to Comment on attachment 93613 [details]
Almost identical to the old patch, fix stylebot issues, remove 'what' comments, renamed m_optimized to m_isDeadCode, add backtrackTermDefault.
r=me, modulo Michael's comments above.
Would be good to come up with a fix to the o(n^2) issue soon.
These YarrOps really lend themselves to a future refactoring that makes them objects with virtual generate / backtrack functions.
Fixed in r86536, will follow up with an investigation of alternatives to the O(N^2) code in backtracking. (This patch got about 1 1/2 hours of regex fuzzing before landing, and has no performance impact on sunspidey / v8). |