The current YARR JIT implementation doesn't support iterative parentheses matching so it does fallback to pcre in these cases which prevents us from removing pcre from JSC.
Created attachment 61521 [details] work in progress patch This patch adds support for matching regular expressions which contain iterative parentheses to YARR JIT. For applying this patch you need a JSC patched by the "Prepare YARR JIT for matching regexps with iterative parentheses" (https://bugs.webkit.org/show_bug.cgi?id=42264) bug's patch because this solution's logic is based on that. The JIT generated code does match by the following simple logic: Every time when matching the iterative parentheses the previous iteration's match is stored on the stack marked by the subpattern ID. Therefore in case of backtrack the result of the last successful matching can be restored. The implementation isn't complete, the following cases which are within iterative parentheses aren't handled therefore these still do fallback to pcre: - disjunction of parentheses contain more than one alternative - the number of iterations is fixed (generateParenthesesFixed) - in case of non-greedy matching (generateParenthesesNonGreedy) However this solution passes all the regression tests in case of greedy matching with an upper limit the result can be wrong and the algorithm doesn't fallback to pcre, so this is also needs further work. These critical points are marked in the code by TODOs. All the fast/js layout tests pass as well, except one: regexp-overflow.html This test doesn't throw an expected exception in case of large number of iteration, but YARR Interpreter doesn't throw an exception neither. However two layouts are fixed in the fast/js: - sputnik/Conformance/15_Native_Objects/15.10_RegExp/15.10.2/15.10.2.5_Term/S15.10.2.5_A1_T4.html - sputnik/Conformance/15_Native_Objects/15.10_RegExp/15.10.6/15.10.6.2_RegExp.prototype.exec/S15.10.6.2_A1_T6.html NOTE: This patch is just a suggestion, not the final solution. I don't offer this for a review now, I just would like to get some feedback before finishing the whole work on iterative parentheses matching. Please don't hesitate to comment!
Created attachment 61522 [details] performance testcase I wrote a performance test for benchmarking greedy matching of parentheses. Results of measurements on Linux: reference: ~3050ms with this patch: ~1150ms
Created attachment 61524 [details] measurement script A simple script for measuring the performance of the testcase. Simply run it with the jsc binary.
Is there a way to avoid stack overflow in cases of substantial backtracking?
The dependency of this patch was changed and this caused performance improvements. These are the performance results: reference modified regexp-dna: 22.2ms +/- 1.4% 22.0ms +/- 1.5% v8-regexp: 1.032x as fast 435.1ms +/- 1.2% 421.6ms +/- 1.2%
(In reply to comment #4) > Is there a way to avoid stack overflow in cases of substantial backtracking? Hi Geoffrey, there was some discussion about stack guards on the preparing patch. You can see: https://bugs.webkit.org/show_bug.cgi?id=42264