Bug 42159 - Matching regular expressions which contain iterative parentheses with YARR JIT
Summary: Matching regular expressions which contain iterative parentheses with YARR JIT
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Peter Varga
URL:
Keywords:
Depends on: 42264
Blocks:
  Show dependency treegraph
 
Reported: 2010-07-13 05:50 PDT by Peter Varga
Modified: 2010-07-15 09:08 PDT (History)
4 users (show)

See Also:


Attachments
work in progress patch (16.26 KB, patch)
2010-07-14 08:23 PDT, Peter Varga
no flags Details | Formatted Diff | Diff
performance testcase (666 bytes, application/javascript)
2010-07-14 08:26 PDT, Peter Varga
no flags Details
measurement script (125 bytes, application/javascript)
2010-07-14 08:32 PDT, Peter Varga
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Peter Varga 2010-07-13 05:50:15 PDT
The current YARR JIT implementation doesn't support iterative parentheses matching so
it does fallback to pcre in these cases which prevents us from removing pcre from JSC.
Comment 1 Peter Varga 2010-07-14 08:23:56 PDT
Created attachment 61521 [details]
work in progress patch

This patch adds support for matching regular expressions which contain iterative
parentheses to YARR JIT. For applying this patch you need a JSC patched by the
"Prepare YARR JIT for matching regexps with iterative parentheses" 
(https://bugs.webkit.org/show_bug.cgi?id=42264) bug's patch because this solution's logic
is based on that.

The JIT generated code does match by the following simple logic:
Every time when matching the iterative parentheses the previous iteration's
match is stored on the stack marked by the subpattern ID. Therefore in
case of backtrack the result of the last successful matching can be restored.

The implementation isn't complete, the following cases which are within
iterative parentheses aren't handled therefore these still do fallback to
pcre:

- disjunction of parentheses contain more than one alternative
- the number of iterations is fixed (generateParenthesesFixed)
- in case of non-greedy matching (generateParenthesesNonGreedy)

However this solution passes all the regression tests in case of greedy matching with an upper limit 
the result can be wrong and the algorithm doesn't fallback to pcre, so this is also needs further work. These critical 
points are marked in the code by TODOs.

All the fast/js layout tests pass as well, except one: regexp-overflow.html
This test doesn't throw an expected exception in case of large number of iteration, but
YARR Interpreter doesn't throw an exception neither. 
However two layouts are fixed in the fast/js:

- sputnik/Conformance/15_Native_Objects/15.10_RegExp/15.10.2/15.10.2.5_Term/S15.10.2.5_A1_T4.html
- sputnik/Conformance/15_Native_Objects/15.10_RegExp/15.10.6/15.10.6.2_RegExp.prototype.exec/S15.10.6.2_A1_T6.html

NOTE: This patch is just a suggestion, not the final solution. I don't offer
this for a review now, I just would like to get some feedback before finishing
the whole work on iterative parentheses matching.

Please don't hesitate to comment!
Comment 2 Peter Varga 2010-07-14 08:26:04 PDT
Created attachment 61522 [details]
performance testcase

I wrote a performance test for benchmarking greedy matching of parentheses.
Results of measurements on Linux:
reference:		~3050ms
with this patch: 	~1150ms
Comment 3 Peter Varga 2010-07-14 08:32:29 PDT
Created attachment 61524 [details]
measurement script

A simple script for measuring the performance of the testcase. Simply run it with the jsc binary.
Comment 4 Geoffrey Garen 2010-07-14 10:09:33 PDT
Is there a way to avoid stack overflow in cases of substantial backtracking?
Comment 5 Peter Varga 2010-07-15 09:05:51 PDT
The dependency of this patch was changed and this caused performance improvements.

These are the performance results:

				  reference		modified
regexp-dna:			22.2ms +/- 1.4%    22.0ms +/- 1.5%
v8-regexp:	1.032x as fast	435.1ms +/- 1.2%    421.6ms +/- 1.2%
Comment 6 Peter Varga 2010-07-15 09:08:36 PDT
(In reply to comment #4)
> Is there a way to avoid stack overflow in cases of substantial backtracking?

Hi Geoffrey, there was some discussion about stack guards on the preparing patch.
You can see: https://bugs.webkit.org/show_bug.cgi?id=42264