Bug 60860 - Simplify backtracking in YARR JIT
Summary: Simplify backtracking in YARR JIT
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Gavin Barraclough
URL:
Keywords:
Depends on:
Blocks: 60866
  Show dependency treegraph
 
Reported: 2011-05-15 16:44 PDT by Gavin Barraclough
Modified: 2011-05-16 00:24 PDT (History)
3 users (show)

See Also:


Attachments
The patch (165.04 KB, patch)
2011-05-15 16:46 PDT, Gavin Barraclough
no flags Details | Formatted Diff | Diff
Layout tests for expressions fixed by the prior patch. (3.62 KB, patch)
2011-05-15 17:26 PDT, Gavin Barraclough
no flags Details | Formatted Diff | Diff
Almost identical to the old patch, fix stylebot issues, remove 'what' comments, renamed m_optimized to m_isDeadCode, add backtrackTermDefault. (166.83 KB, patch)
2011-05-15 23:28 PDT, Gavin Barraclough
ggaren: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Gavin Barraclough 2011-05-15 16:44:26 PDT
YARR JIT currently performs a single pass of code generation over the pattern, with special handling to allow the code generation for some backtracking code out of line.  We can simplify things by moving to a common mechanism whereby all forwards matching code is generated in one pass, and all backtracking code is generated in another.  Backtracking code can be generated in reverse order, to optimized the common fall-through case.

To make it easier to walk over the pattern, we can first convert to a more byte-code like format before JIT generating.  In time we should unify this with the YARR interpreter to more closely unify the two.
Comment 1 Gavin Barraclough 2011-05-15 16:46:56 PDT
Created attachment 93592 [details]
The patch
Comment 2 WebKit Review Bot 2011-05-15 16:49:44 PDT
Attachment 93592 [details] did not pass style-queue:

Failed to run "['Tools/Scripts/check-webkit-style', '--diff-files', u'Source/JavaScriptCore/ChangeLog', u'Source..." exit_code: 1

Source/JavaScriptCore/yarr/YarrJIT.cpp:1503:  Missing space before ( in while(  [whitespace/parens] [5]
Source/JavaScriptCore/yarr/YarrJIT.cpp:1581:  Should have only a single space after a punctuation in a comment.  [whitespace/comments] [5]
Source/JavaScriptCore/yarr/YarrJIT.cpp:1660:  Should have only a single space after a punctuation in a comment.  [whitespace/comments] [5]
Source/JavaScriptCore/yarr/YarrJIT.cpp:2002:  Should have only a single space after a punctuation in a comment.  [whitespace/comments] [5]
Source/JavaScriptCore/yarr/YarrJIT.cpp:2026:  Missing space before ( in while(  [whitespace/parens] [5]
Source/JavaScriptCore/yarr/YarrJIT.cpp:2399:  Should have only a single space after a punctuation in a comment.  [whitespace/comments] [5]
Total errors found: 6 in 2 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 3 Gavin Barraclough 2011-05-15 17:26:28 PDT
Created attachment 93593 [details]
Layout tests for expressions fixed by the prior patch.
Comment 4 Gavin Barraclough 2011-05-15 19:34:07 PDT
Fixing backtracking is blocking https://bugs.webkit.org/show_bug.cgi?id=60866.
Comment 5 Geoffrey Garen 2011-05-15 22:38:45 PDT
Please fix style bot issues, and remove this:

+//        breakpoint();
Comment 6 Geoffrey Garen 2011-05-15 23:04:00 PDT
There are a lot of great comments here, but also a lot of comments that describe the "what" and not the "why". I think you should remove comments that just describe the "what" (listed below).

In theory, a "what" comment clutters the code, since it's redundant with what the code says. When a "what" comment does add something, that's probably a sign that the code it comments needs clearer names and abstractions.

+                    break; // Nothing else to do! break back to the switch.

+                // Find the beginning of this set of alternatives.

+                // Check whether this set of alternatives are repeating or 'once through'.

+                        // Update the input position.

+                    // Link any input check failures from the prior alternative to this point.

+                    // Calculate the difference in minimum size between the two alternatives, and
+                    // update the input position.

+                // If the size of the pattern is not fixed, then we need to update the start index.

+                // Store the match start index, if we needed to & didn't do so above.

+                    // If minimum size is zero, thenwe can store the current index directly.

+                    // Increment position & check input for the first alternative - if sufficient
+                    // input is available just to it, else jump to it's input check failure handler.

+                // Manage m_checked.

+                // Manage m_checked.

+                // For fixed count (always executed once) subpatterns, nothing to do here.

+        // Emit all alternatives, connect as a doubly linked list.

+        // Emit all alternatives, connect as a doubly linked list.

+        // If there are no repeating alternatives then emit an
+        // 'OpMatchFailed' to return no match.

+        // Run forwards over 'm_ops', generating the main matching path.
+        generate();
+        // Run backwards over 'm_ops', generating code for backtracking.
+        backtrack();
Comment 7 Geoffrey Garen 2011-05-15 23:19:33 PDT
+        // The 'optimized' flag is used to null out the second pattern character,
+        // when two are fused to match a pair together.
+        bool m_optimized;

How about "m_isDeadCode" or "m_didFuseWithPrevious" to make it more clear that this means the op can be removed because it fused with the previous op?
Comment 8 Gavin Barraclough 2011-05-15 23:28:13 PDT
Created attachment 93613 [details]
Almost identical to the old patch, fix stylebot issues, remove 'what' comments, renamed m_optimized to m_isDeadCode, add backtrackTermDefault.

Per a request over IRC by msaboff, have shared common implementations of term backtracking in backtrackTermDefault().
Comment 9 Michael Saboff 2011-05-16 00:02:18 PDT
Comments from reviewing the first patch.  Couldn't publish due to "midair" collision.

View in context: https://bugs.webkit.org/attachment.cgi?id=93592&action=review

> Source/JavaScriptCore/yarr/YarrJIT.cpp:354
> +        // The operation, as a YarOpCode, and also a reference to the PatternTerm.
Spelling of YarOpCode

> Source/JavaScriptCore/yarr/YarrJIT.cpp:376
> +        bool m_optimized;
Too generic a name for the specific character pairing optimization.

> Source/JavaScriptCore/yarr/YarrJIT.cpp:378
> +        // Currentlyused in the case of some of the more complex management of

Currently used

> Source/JavaScriptCore/yarr/YarrJIT.cpp:384
> +        // value that will be puched into the pattern's frame to return to,

*pushed*

> Source/JavaScriptCore/yarr/YarrJIT.cpp:390
> +    // This class encapsualtes information about the state of code generation

*encapsulates*

> Source/JavaScriptCore/yarr/YarrJIT.cpp:642
> +        YarrOp& nextOp = m_ops[opIndex + 1];

Added ASSERT(opIndex + 1 < m_ops.size()); before.

> Source/JavaScriptCore/yarr/YarrJIT.cpp:1154
> +            // In the case 'once through' expressions, the End node will laso have a reentry
*also*

> Source/JavaScriptCore/yarr/YarrJIT.cpp:1196
> +

Remove extra blank line.

> Source/JavaScriptCore/yarr/YarrJIT.cpp:1200
> +

ditto

> Source/JavaScriptCore/yarr/YarrJIT.cpp:1231
> +            // we do not need to be able  to backtrack back into any alternative other than

remove extra space between able & to.

> Source/JavaScriptCore/yarr/YarrJIT.cpp:1280
> +                // alternatives bve responsible for checking that input is available (if

spelling of "bve"?

> Source/JavaScriptCore/yarr/YarrJIT.cpp:1759
> +            // In the case of non-simple nested assertions we nedd to also link the

*need*

> Source/JavaScriptCore/yarr/YarrJIT.cpp:1782
> +                // In the case of a single alterantive on its own, we don't need to

*alternative*

> Source/JavaScriptCore/yarr/YarrJIT.cpp:1784
> +                // continue to backtrack out of the paretheses without jumping.

*parentheses*

> Source/JavaScriptCore/yarr/YarrJIT.cpp:1792
> +                // If the alternative had adjusted the input possition we must link
*position*

> Source/JavaScriptCore/yarr/YarrJIT.cpp:1803
> +                        // The last of more than one alterantives must jump back to the begnning.
*alternatives*

> Source/JavaScriptCore/yarr/YarrJIT.cpp:1902
> +                    // If caputring, clear the capture (we only need to reset start).
*capturing*

> Source/JavaScriptCore/yarr/YarrJIT.cpp:1929
> +                    // Check whether we should backtrack back into the paretheses, or if we
*parentheses*

> Source/JavaScriptCore/yarr/YarrJIT.cpp:1972
> +                // We should never be backtracking  to here (hence the 'terminal' in the name).

remove extra space between backtracking and to
Comment 10 Geoffrey Garen 2011-05-16 00:07:15 PDT
Comment on attachment 93613 [details]
Almost identical to the old patch, fix stylebot issues, remove 'what' comments, renamed m_optimized to m_isDeadCode, add backtrackTermDefault.

r=me, modulo Michael's comments above.

Would be good to come up with a fix to the o(n^2) issue soon.

These YarrOps really lend themselves to a future refactoring that makes them objects with virtual generate / backtrack functions.
Comment 11 Gavin Barraclough 2011-05-16 00:22:23 PDT
Fixed in r86536, will follow up with an investigation of alternatives to the O(N^2) code in backtracking.
Comment 12 Gavin Barraclough 2011-05-16 00:24:27 PDT
(This patch got about 1 1/2 hours of regex fuzzing before landing, and has no performance impact on sunspidey / v8).